2026-01-23 01:58:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mda-mem5 
2026-01-23 01:58:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mda-mem5 
2026-01-23 01:58:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1494f420ef10>}
2026-01-23 01:58:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-23 01:58:00,640 baseline-bpql-mda-noisy-hopper:91 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-23 01:58:00,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-23 01:58:00,657 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-23 01:58:00,658 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:58:00,663 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2026-01-23 01:58:01,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-23 01:58:01,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-23 02:01:24,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:24,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 12.16902 ± 0.575
2026-01-23 02:01:24,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [12.997083, 13.126436, 11.887091, 11.613817, 11.590014, 12.408147, 11.423986, 12.157795, 12.667856, 11.817988]
2026-01-23 02:01:24,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [16.0, 16.0, 15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 16.0, 15.0]
2026-01-23 02:01:24,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (12.17) for latency DatasetOffice
2026-01-23 02:01:24,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 35 minutes, 36 seconds)
2026-01-23 02:05:00,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:03,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 282.37180 ± 37.463
2026-01-23 02:05:03,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [271.8475, 310.5452, 237.98448, 222.13702, 233.05226, 279.32584, 307.58658, 325.19647, 311.75006, 324.2927]
2026-01-23 02:05:03,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [232.0, 270.0, 198.0, 189.0, 197.0, 242.0, 268.0, 281.0, 270.0, 281.0]
2026-01-23 02:05:03,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (282.37) for latency DatasetOffice
2026-01-23 02:05:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 44 minutes, 58 seconds)
2026-01-23 02:08:38,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:39,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 138.71136 ± 76.860
2026-01-23 02:08:39,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [101.5097, 244.15025, 55.86061, 37.005386, 293.58368, 155.96077, 90.64958, 140.22403, 171.00789, 97.16176]
2026-01-23 02:08:39,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [55.0, 109.0, 35.0, 39.0, 162.0, 76.0, 52.0, 74.0, 97.0, 53.0]
2026-01-23 02:08:39,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 43 minutes, 59 seconds)
2026-01-23 02:12:17,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:18,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 215.43051 ± 19.238
2026-01-23 02:12:18,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [249.5911, 239.10062, 197.9559, 207.50119, 188.80061, 208.69662, 215.2574, 192.93318, 221.47644, 232.99208]
2026-01-23 02:12:18,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [125.0, 122.0, 104.0, 110.0, 96.0, 110.0, 112.0, 102.0, 114.0, 118.0]
2026-01-23 02:12:18,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 42 minutes, 50 seconds)
2026-01-23 02:15:56,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:00,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 427.54541 ± 239.364
2026-01-23 02:16:00,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [798.33405, 197.75145, 236.11041, 151.28828, 448.38785, 882.6589, 554.4447, 241.18733, 442.93723, 322.35406]
2026-01-23 02:16:00,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [601.0, 187.0, 158.0, 117.0, 265.0, 520.0, 353.0, 161.0, 260.0, 188.0]
2026-01-23 02:16:00,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (427.55) for latency DatasetOffice
2026-01-23 02:16:00,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 41 minutes, 34 seconds)
2026-01-23 02:19:32,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:35,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 258.47174 ± 178.767
2026-01-23 02:19:35,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [139.63089, 136.17186, 408.92252, 150.27461, 148.55789, 588.05255, 155.05156, 171.15555, 570.04254, 116.85747]
2026-01-23 02:19:35,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [104.0, 133.0, 304.0, 109.0, 102.0, 350.0, 115.0, 105.0, 453.0, 75.0]
2026-01-23 02:19:35,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 41 minutes, 44 seconds)
2026-01-23 02:23:11,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:12,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 190.74622 ± 90.408
2026-01-23 02:23:12,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [358.3667, 179.28188, 170.21474, 213.6013, 143.77315, 318.12762, 209.07695, 42.997234, 81.358635, 190.66393]
2026-01-23 02:23:12,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [157.0, 98.0, 95.0, 108.0, 82.0, 149.0, 115.0, 25.0, 50.0, 112.0]
2026-01-23 02:23:12,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 37 minutes, 32 seconds)
2026-01-23 02:26:45,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:48,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 434.32422 ± 118.775
2026-01-23 02:26:48,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [301.15, 317.43063, 430.8081, 569.6272, 236.39513, 373.42776, 471.6618, 480.2573, 611.22687, 551.25757]
2026-01-23 02:26:48,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [176.0, 213.0, 235.0, 308.0, 134.0, 210.0, 246.0, 243.0, 320.0, 274.0]
2026-01-23 02:26:48,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (434.32) for latency DatasetOffice
2026-01-23 02:26:48,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 33 minutes, 50 seconds)
2026-01-23 02:30:23,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:26,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 295.02484 ± 116.738
2026-01-23 02:30:26,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [434.1372, 473.07077, 227.5229, 197.82498, 233.23656, 176.66057, 397.09494, 163.85028, 433.3212, 213.5288]
2026-01-23 02:30:26,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [272.0, 285.0, 190.0, 163.0, 160.0, 121.0, 283.0, 115.0, 233.0, 181.0]
2026-01-23 02:30:26,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 29 minutes, 55 seconds)
2026-01-23 02:34:04,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1047.86353 ± 484.997
2026-01-23 02:34:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1569.1212, 1325.6198, 1037.5381, 573.3579, 2128.6577, 837.29456, 502.872, 577.8222, 1092.3088, 834.0443]
2026-01-23 02:34:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [633.0, 547.0, 453.0, 242.0, 1000.0, 344.0, 208.0, 248.0, 530.0, 382.0]
2026-01-23 02:34:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1047.86) for latency DatasetOffice
2026-01-23 02:34:11,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 27 minutes, 16 seconds)
2026-01-23 02:37:41,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:43,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 462.40326 ± 128.577
2026-01-23 02:37:43,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [520.8302, 476.75354, 509.69153, 511.19, 492.57535, 498.32156, 526.08765, 500.64127, 509.17606, 78.76584]
2026-01-23 02:37:43,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [206.0, 187.0, 199.0, 200.0, 189.0, 199.0, 211.0, 197.0, 195.0, 45.0]
2026-01-23 02:37:43,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 22 minutes, 50 seconds)
2026-01-23 02:41:21,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:25,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 647.25818 ± 260.186
2026-01-23 02:41:25,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [152.83209, 674.67975, 1273.4967, 772.66437, 633.87134, 550.37177, 576.2077, 583.4177, 674.34924, 580.69135]
2026-01-23 02:41:25,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [81.0, 234.0, 477.0, 354.0, 218.0, 191.0, 191.0, 197.0, 231.0, 200.0]
2026-01-23 02:41:25,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 20 minutes, 28 seconds)
2026-01-23 02:44:58,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:45:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1163.42883 ± 382.076
2026-01-23 02:45:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [893.6764, 1402.2236, 860.545, 816.2389, 1135.2815, 1643.7119, 965.82477, 667.63275, 1921.3184, 1327.8348]
2026-01-23 02:45:05,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [391.0, 566.0, 349.0, 344.0, 404.0, 613.0, 388.0, 243.0, 705.0, 528.0]
2026-01-23 02:45:05,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1163.43) for latency DatasetOffice
2026-01-23 02:45:05,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 18 minutes, 1 second)
2026-01-23 02:48:38,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:42,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 702.88849 ± 234.059
2026-01-23 02:48:42,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [930.2482, 444.14218, 725.73645, 856.80536, 667.5545, 327.40637, 657.74646, 1194.3567, 649.8953, 574.9934]
2026-01-23 02:48:42,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [360.0, 209.0, 319.0, 337.0, 250.0, 152.0, 270.0, 478.0, 261.0, 203.0]
2026-01-23 02:48:42,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 14 minutes, 13 seconds)
2026-01-23 02:52:15,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 744.20392 ± 423.446
2026-01-23 02:52:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [67.62415, 893.4539, 567.3473, 1667.6774, 584.4634, 609.6279, 809.3258, 894.5949, 256.47202, 1091.452]
2026-01-23 02:52:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [77.0, 369.0, 249.0, 716.0, 254.0, 260.0, 334.0, 375.0, 119.0, 433.0]
2026-01-23 02:52:19,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 8 minutes, 27 seconds)
2026-01-23 02:56:03,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:56:11,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1318.89648 ± 780.830
2026-01-23 02:56:11,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [909.6067, 618.0072, 433.45975, 687.7488, 1817.8853, 2402.688, 706.30334, 2168.4893, 2561.9006, 882.8771]
2026-01-23 02:56:11,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [332.0, 292.0, 211.0, 315.0, 726.0, 1000.0, 252.0, 935.0, 1000.0, 317.0]
2026-01-23 02:56:11,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1318.90) for latency DatasetOffice
2026-01-23 02:56:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 10 minutes, 1 second)
2026-01-23 02:59:39,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1788.55005 ± 858.377
2026-01-23 02:59:49,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2863.5056, 808.99164, 1475.5944, 2316.323, 1419.3785, 673.6168, 2332.323, 595.83905, 2942.4534, 2457.4739]
2026-01-23 02:59:49,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 303.0, 539.0, 939.0, 512.0, 257.0, 816.0, 251.0, 1000.0, 1000.0]
2026-01-23 02:59:49,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1788.55) for latency DatasetOffice
2026-01-23 02:59:49,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 5 minutes, 26 seconds)
2026-01-23 03:03:27,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:32,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 827.02637 ± 1001.563
2026-01-23 03:03:32,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2502.0828, 44.314648, 71.04672, 146.11455, 68.66288, 90.572205, 100.52689, 2698.2324, 1150.2911, 1398.4198]
2026-01-23 03:03:32,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 26.0, 44.0, 85.0, 42.0, 52.0, 70.0, 1000.0, 413.0, 458.0]
2026-01-23 03:03:32,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 2 minutes, 40 seconds)
2026-01-23 03:07:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:12,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1438.53137 ± 639.943
2026-01-23 03:07:12,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1275.5979, 2190.3855, 1250.9042, 3015.036, 1238.3844, 913.8241, 1266.2465, 1481.4116, 925.727, 827.79724]
2026-01-23 03:07:12,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [398.0, 708.0, 394.0, 1000.0, 390.0, 294.0, 399.0, 484.0, 300.0, 289.0]
2026-01-23 03:07:12,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 59 minutes, 40 seconds)
2026-01-23 03:10:47,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:56,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1672.73218 ± 815.805
2026-01-23 03:10:56,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [618.9505, 53.09303, 2508.3757, 1874.4338, 1584.8835, 1845.9805, 1159.8208, 2648.9773, 2627.3113, 1805.4958]
2026-01-23 03:10:56,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [249.0, 32.0, 886.0, 689.0, 577.0, 626.0, 383.0, 1000.0, 1000.0, 663.0]
2026-01-23 03:10:56,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 57 minutes, 41 seconds)
2026-01-23 03:14:36,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:40,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 831.59753 ± 228.802
2026-01-23 03:14:40,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1266.4312, 773.3784, 847.53156, 938.3787, 746.5899, 925.73224, 804.18726, 870.44446, 859.6494, 283.65244]
2026-01-23 03:14:40,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [413.0, 264.0, 278.0, 309.0, 262.0, 301.0, 268.0, 285.0, 280.0, 123.0]
2026-01-23 03:14:40,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 52 minutes, 5 seconds)
2026-01-23 03:18:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:21,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1593.81982 ± 735.538
2026-01-23 03:18:21,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [668.23773, 876.6438, 453.16443, 1020.94794, 2192.8035, 1866.9025, 2005.1956, 1985.2919, 2087.737, 2781.2732]
2026-01-23 03:18:21,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [240.0, 299.0, 181.0, 327.0, 808.0, 674.0, 732.0, 655.0, 742.0, 1000.0]
2026-01-23 03:18:21,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 49 minutes, 6 seconds)
2026-01-23 03:22:09,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:22:19,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2002.18591 ± 687.115
2026-01-23 03:22:19,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1786.3206, 1833.0295, 2302.316, 2926.293, 1354.1251, 1209.5201, 2921.6597, 1217.2201, 2971.5725, 1499.8035]
2026-01-23 03:22:19,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [575.0, 586.0, 756.0, 1000.0, 448.0, 395.0, 1000.0, 389.0, 1000.0, 489.0]
2026-01-23 03:22:19,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2002.19) for latency DatasetOffice
2026-01-23 03:22:19,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 49 minutes, 17 seconds)
2026-01-23 03:25:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:26:08,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2623.42920 ± 550.836
2026-01-23 03:26:08,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2981.5254, 2955.9414, 2968.673, 1983.8025, 2823.3665, 2899.5415, 1429.2847, 3199.8044, 2071.736, 2920.6167]
2026-01-23 03:26:08,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 653.0, 944.0, 1000.0, 465.0, 1000.0, 657.0, 1000.0]
2026-01-23 03:26:08,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2623.43) for latency DatasetOffice
2026-01-23 03:26:08,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 47 minutes, 52 seconds)
2026-01-23 03:29:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:36,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1422.78711 ± 795.557
2026-01-23 03:29:36,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [25.25663, 1438.6643, 150.25952, 1450.5979, 1628.8402, 2722.114, 976.27075, 1927.8376, 1898.8063, 2009.2241]
2026-01-23 03:29:36,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [31.0, 543.0, 80.0, 499.0, 547.0, 1000.0, 385.0, 644.0, 675.0, 683.0]
2026-01-23 03:29:36,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 40 minutes, 1 second)
2026-01-23 03:33:11,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:21,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2083.17676 ± 978.086
2026-01-23 03:33:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2950.663, 3010.1824, 1246.4236, 2992.7974, 3054.3582, 1324.0059, 1084.8577, 249.91367, 2824.7544, 2093.8113]
2026-01-23 03:33:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 418.0, 1000.0, 1000.0, 439.0, 382.0, 125.0, 1000.0, 698.0]
2026-01-23 03:33:21,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 36 minutes, 40 seconds)
2026-01-23 03:37:08,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:21,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2660.37427 ± 637.654
2026-01-23 03:37:21,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3081.4004, 2968.1982, 3092.902, 1493.4629, 2928.0344, 3070.7358, 1358.4026, 2528.0027, 3007.0784, 3075.5232]
2026-01-23 03:37:21,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 494.0, 1000.0, 1000.0, 464.0, 816.0, 1000.0, 1000.0]
2026-01-23 03:37:21,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2660.37) for latency DatasetOffice
2026-01-23 03:37:21,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 37 minutes, 30 seconds)
2026-01-23 03:40:58,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:11,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2533.85571 ± 455.296
2026-01-23 03:41:11,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2215.408, 2874.455, 2447.4827, 2956.272, 3018.0225, 2937.1582, 1818.2908, 3015.6213, 1937.1046, 2118.74]
2026-01-23 03:41:11,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [762.0, 1000.0, 819.0, 1000.0, 1000.0, 1000.0, 604.0, 1000.0, 650.0, 691.0]
2026-01-23 03:41:11,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 31 minutes, 34 seconds)
2026-01-23 03:44:47,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:44:59,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2630.85864 ± 1032.806
2026-01-23 03:44:59,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3275.6067, 3239.0276, 3280.4082, 3199.4163, 1363.0875, 3230.895, 3274.5303, 201.81433, 3275.4, 1968.4]
2026-01-23 03:44:59,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 491.0, 1000.0, 1000.0, 106.0, 1000.0, 626.0]
2026-01-23 03:44:59,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 27 minutes, 37 seconds)
2026-01-23 03:48:29,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:38,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1808.86487 ± 1256.992
2026-01-23 03:48:38,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3249.37, 1140.1581, 3249.2605, 1108.848, 733.74005, 385.44485, 3312.0483, 3302.218, 1518.3555, 89.20527]
2026-01-23 03:48:38,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 366.0, 1000.0, 359.0, 257.0, 161.0, 1000.0, 1000.0, 485.0, 51.0]
2026-01-23 03:48:38,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 26 minutes, 27 seconds)
2026-01-23 03:52:12,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:52:24,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2604.72925 ± 858.772
2026-01-23 03:52:24,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2993.2317, 1301.9865, 3185.4321, 3223.854, 3195.8164, 3148.3516, 3203.7087, 1402.2086, 3202.086, 1190.6174]
2026-01-23 03:52:24,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [927.0, 395.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 447.0, 1000.0, 374.0]
2026-01-23 03:52:24,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 22 minutes, 41 seconds)
2026-01-23 03:55:57,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:11,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2989.34253 ± 597.361
2026-01-23 03:56:11,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3185.5352, 3206.2646, 3225.6992, 3162.7964, 3204.128, 3159.9475, 3226.9995, 3155.8315, 1198.8749, 3167.3499]
2026-01-23 03:56:11,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 408.0, 1000.0]
2026-01-23 03:56:11,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2989.34) for latency DatasetOffice
2026-01-23 03:56:11,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 16 minutes, 10 seconds)
2026-01-23 03:59:52,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1884.67700 ± 1499.690
2026-01-23 04:00:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3170.5608, 3200.3079, 3209.2869, 3177.2822, 3232.5525, 2617.163, 21.662745, 29.709997, 137.06993, 51.173107]
2026-01-23 04:00:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 825.0, 20.0, 29.0, 75.0, 34.0]
2026-01-23 04:00:01,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 12 minutes, 31 seconds)
2026-01-23 04:03:32,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:03:44,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2812.28564 ± 1021.347
2026-01-23 04:03:44,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3282.152, 3313.4976, 3268.4158, 3302.6987, 3304.213, 169.10321, 1566.2614, 3304.1016, 3315.1584, 3297.2559]
2026-01-23 04:03:44,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 88.0, 506.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:03:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 7 minutes, 34 seconds)
2026-01-23 04:07:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:07:38,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2525.08447 ± 1108.634
2026-01-23 04:07:38,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3359.7878, 3310.748, 3357.988, 3343.1719, 1060.4308, 3343.9888, 1487.0635, 3394.185, 309.97556, 2283.5085]
2026-01-23 04:07:38,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 340.0, 1000.0, 484.0, 1000.0, 139.0, 686.0]
2026-01-23 04:07:38,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 7 minutes, 1 second)
2026-01-23 04:11:08,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:11:15,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1532.31482 ± 1500.388
2026-01-23 04:11:15,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1539.5297, 177.5074, 48.344357, 185.68163, 79.77675, 95.98879, 3276.2104, 3295.2986, 3291.555, 3333.2563]
2026-01-23 04:11:15,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [491.0, 85.0, 36.0, 92.0, 50.0, 55.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:11:15,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 1 minute, 26 seconds)
2026-01-23 04:15:04,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:15:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3178.71777 ± 235.822
2026-01-23 04:15:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3288.2087, 2474.6165, 3253.6956, 3279.54, 3241.9312, 3220.4458, 3266.117, 3261.8586, 3218.5493, 3282.2146]
2026-01-23 04:15:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 789.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:15:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (3178.72) for latency DatasetOffice
2026-01-23 04:15:19,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 57 seconds)
2026-01-23 04:18:41,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1368.27246 ± 1495.458
2026-01-23 04:18:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [119.412994, 3348.581, 119.15506, 3269.215, 405.44913, 49.83553, 121.78951, 2742.4253, 3382.0486, 124.81222]
2026-01-23 04:18:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [69.0, 1000.0, 67.0, 1000.0, 165.0, 43.0, 70.0, 821.0, 1000.0, 70.0]
2026-01-23 04:18:47,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 52 minutes, 39 seconds)
2026-01-23 04:22:27,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:22:34,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1557.61694 ± 1333.317
2026-01-23 04:22:34,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1255.3036, 3207.7644, 3215.1584, 2780.1348, 1310.7574, 3192.0752, 430.7902, 36.80505, 104.43637, 42.944103]
2026-01-23 04:22:34,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [415.0, 1000.0, 1000.0, 883.0, 421.0, 1000.0, 181.0, 35.0, 67.0, 25.0]
2026-01-23 04:22:34,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 49 minutes, 45 seconds)
2026-01-23 04:26:11,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:26:26,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3330.37646 ± 8.246
2026-01-23 04:26:26,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3335.4636, 3341.4966, 3332.6304, 3317.529, 3320.4055, 3327.3752, 3339.4668, 3330.9233, 3320.1162, 3338.36]
2026-01-23 04:26:26,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:26:26,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (3330.38) for latency DatasetOffice
2026-01-23 04:26:26,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 45 minutes, 35 seconds)
2026-01-23 04:30:13,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:30:26,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2960.92432 ± 523.535
2026-01-23 04:30:26,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3004.3457, 3297.394, 1852.0077, 3199.8835, 3267.5771, 3249.7822, 3246.3699, 3250.4426, 2002.3585, 3239.0793]
2026-01-23 04:30:26,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [940.0, 1000.0, 570.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 647.0, 1000.0]
2026-01-23 04:30:26,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 46 minutes, 23 seconds)
2026-01-23 04:34:06,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:34:16,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2182.31470 ± 1235.966
2026-01-23 04:34:16,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3076.3281, 3362.144, 2059.8176, 1986.478, 36.428326, 932.5895, 439.73923, 3365.5107, 3213.2444, 3350.867]
2026-01-23 04:34:16,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [938.0, 1000.0, 635.0, 641.0, 41.0, 323.0, 186.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:34:16,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 39 minutes, 48 seconds)
2026-01-23 04:37:44,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:37:58,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2943.92676 ± 775.919
2026-01-23 04:37:58,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3267.2976, 3202.665, 3164.2073, 3231.5564, 3179.7937, 3244.0942, 3174.0894, 618.3229, 3193.1482, 3164.092]
2026-01-23 04:37:58,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 231.0, 1000.0, 1000.0]
2026-01-23 04:37:58,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 38 minutes, 35 seconds)
2026-01-23 04:41:21,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:41:33,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2823.64331 ± 881.668
2026-01-23 04:41:33,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3322.024, 3306.4495, 3276.3486, 3332.0576, 735.59924, 3340.1348, 3230.5706, 1491.8925, 2867.088, 3334.2666]
2026-01-23 04:41:33,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 273.0, 1000.0, 1000.0, 458.0, 877.0, 1000.0]
2026-01-23 04:41:33,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 32 minutes, 39 seconds)
2026-01-23 04:45:17,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:45:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3214.56006 ± 323.610
2026-01-23 04:45:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3325.9028, 2250.8235, 3366.8298, 3343.1377, 3325.2178, 3261.8965, 3252.4504, 3387.1565, 3315.4658, 3316.7236]
2026-01-23 04:45:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 663.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:45:31,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 29 minutes, 57 seconds)
2026-01-23 04:49:00,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:49:13,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2971.95581 ± 713.742
2026-01-23 04:49:13,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1159.7579, 3367.203, 3340.3484, 3220.5046, 3364.315, 3413.2993, 3268.8967, 2052.1511, 3267.749, 3265.3328]
2026-01-23 04:49:13,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [391.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 641.0, 1000.0, 1000.0]
2026-01-23 04:49:13,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 22 minutes, 51 seconds)
2026-01-23 04:53:01,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:53:07,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1247.25122 ± 1410.154
2026-01-23 04:53:07,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3303.1775, 3221.9727, 361.64383, 3299.485, 1744.4001, 25.317574, 51.4486, 203.8612, 202.44194, 58.763832]
2026-01-23 04:53:07,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 139.0, 1000.0, 551.0, 24.0, 37.0, 89.0, 100.0, 35.0]
2026-01-23 04:53:07,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 19 minutes, 52 seconds)
2026-01-23 04:56:33,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:56:43,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2319.90356 ± 1266.827
2026-01-23 04:56:43,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3251.16, 2165.9023, 3266.033, 3237.538, 3309.6172, 3234.9783, 3235.1108, 254.9068, 15.61379, 1228.1765]
2026-01-23 04:56:43,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 665.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 110.0, 15.0, 390.0]
2026-01-23 04:56:43,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 15 minutes, 6 seconds)
2026-01-23 05:00:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:00:26,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3038.03271 ± 504.596
2026-01-23 05:00:26,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3203.028, 3305.7837, 3252.6785, 3199.6, 3284.908, 3249.1885, 3240.8909, 1568.8859, 3224.7627, 2850.6016]
2026-01-23 05:00:26,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 518.0, 1000.0, 881.0]
2026-01-23 05:00:26,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 12 minutes, 36 seconds)
2026-01-23 05:03:56,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:04:07,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2389.16528 ± 1310.982
2026-01-23 05:04:07,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3381.702, 3354.777, 1453.0819, 3359.621, 3358.9504, 3363.182, 3348.5696, 2085.9185, 138.3257, 47.5233]
2026-01-23 05:04:07,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 452.0, 1000.0, 1000.0, 1000.0, 1000.0, 673.0, 70.0, 28.0]
2026-01-23 05:04:07,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 5 minutes, 57 seconds)
2026-01-23 05:07:40,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:07:53,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3174.88037 ± 366.541
2026-01-23 05:07:53,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3359.857, 3341.0588, 3344.7834, 3350.1987, 2997.3489, 3247.6968, 2121.818, 3380.3086, 3276.7556, 3328.9775]
2026-01-23 05:07:53,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 921.0, 1000.0, 629.0, 996.0, 1000.0, 1000.0]
2026-01-23 05:07:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 2 minutes, 54 seconds)
2026-01-23 05:11:26,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:11:39,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3045.40869 ± 661.996
2026-01-23 05:11:39,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3322.9866, 3347.0862, 3378.5002, 3368.4558, 3338.1577, 3326.5217, 3251.1592, 3325.1084, 1162.9069, 2633.2024]
2026-01-23 05:11:39,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 371.0, 811.0]
2026-01-23 05:11:39,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2026-01-23 05:15:23,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:15:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1784.44885 ± 1224.064
2026-01-23 05:15:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3347.1528, 1815.6843, 3395.6082, 1992.5961, 2007.1124, 35.04882, 1173.4794, 387.41763, 352.07486, 3338.314]
2026-01-23 05:15:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 547.0, 1000.0, 611.0, 641.0, 45.0, 388.0, 151.0, 144.0, 1000.0]
2026-01-23 05:15:31,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 56 minutes, 40 seconds)
2026-01-23 05:19:00,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:19:13,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2973.57275 ± 807.407
2026-01-23 05:19:13,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1620.9883, 3349.1484, 3371.5012, 3390.1387, 3352.158, 3386.0542, 3392.4822, 3356.7588, 3389.065, 1127.4327]
2026-01-23 05:19:13,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [485.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 368.0]
2026-01-23 05:19:13,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 52 minutes, 39 seconds)
2026-01-23 05:22:54,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:23:06,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2780.69580 ± 1107.879
2026-01-23 05:23:06,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3371.1252, 3366.9937, 3305.4312, 3335.739, 3272.8901, 774.5266, 3314.126, 3347.7869, 3346.894, 371.44324]
2026-01-23 05:23:06,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 265.0, 1000.0, 1000.0, 1000.0, 146.0]
2026-01-23 05:23:06,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 50 minutes, 56 seconds)
2026-01-23 05:26:25,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:26:38,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3008.06104 ± 638.361
2026-01-23 05:26:38,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3313.5483, 1197.6187, 3244.8726, 3212.3677, 3277.499, 3324.7256, 3305.7827, 3306.842, 3301.0728, 2596.2798]
2026-01-23 05:26:38,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 396.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 782.0]
2026-01-23 05:26:38,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 44 minutes, 54 seconds)
2026-01-23 05:30:16,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:30:27,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2795.17725 ± 1155.810
2026-01-23 05:30:27,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3281.705, 3329.2622, 3387.7266, 1015.36273, 3336.2275, 3421.9753, 3358.753, 38.005848, 3359.5962, 3423.1577]
2026-01-23 05:30:27,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 336.0, 1000.0, 1000.0, 1000.0, 30.0, 1000.0, 1000.0]
2026-01-23 05:30:28,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 41 minutes, 43 seconds)
2026-01-23 05:33:56,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:34:06,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2439.40063 ± 1245.641
2026-01-23 05:34:06,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3331.621, 1973.2361, 3378.4346, 3364.2563, 3110.5505, 3424.95, 3350.5068, 2146.741, 93.01058, 220.6984]
2026-01-23 05:34:06,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 586.0, 1000.0, 1000.0, 914.0, 1000.0, 1000.0, 664.0, 54.0, 98.0]
2026-01-23 05:34:06,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 36 minutes, 10 seconds)
2026-01-23 05:37:38,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:37:51,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3169.43213 ± 601.122
2026-01-23 05:37:51,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3309.0652, 3449.5518, 3333.4265, 3365.921, 3368.9443, 3383.126, 1369.9813, 3319.2185, 3399.8901, 3395.1965]
2026-01-23 05:37:51,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 449.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:37:51,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 32 minutes, 51 seconds)
2026-01-23 05:41:26,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:41:39,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3114.71143 ± 623.791
2026-01-23 05:41:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3307.4436, 3329.7202, 3309.808, 3348.4536, 3310.8765, 3313.6428, 1245.0553, 3277.517, 3320.6077, 3383.9875]
2026-01-23 05:41:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 401.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:41:39,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 28 minutes, 24 seconds)
2026-01-23 05:45:00,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:45:06,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1602.41467 ± 1561.937
2026-01-23 05:45:06,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1924.4054, 3385.659, 3439.9607, 3407.737, 3375.0234, 55.43779, 194.94572, 16.637585, 159.52342, 64.81673]
2026-01-23 05:45:06,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [552.0, 1000.0, 1000.0, 1000.0, 1000.0, 32.0, 100.0, 18.0, 79.0, 37.0]
2026-01-23 05:45:06,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 24 minutes, 6 seconds)
2026-01-23 05:48:36,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:48:49,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2878.29126 ± 958.062
2026-01-23 05:48:49,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3264.4932, 3415.5427, 406.8365, 3360.1794, 3321.003, 1703.8444, 3219.4558, 3340.8533, 3396.6924, 3354.0137]
2026-01-23 05:48:49,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 154.0, 1000.0, 1000.0, 523.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:48:49,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 19 minutes, 27 seconds)
2026-01-23 05:52:09,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:52:23,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3316.80737 ± 208.980
2026-01-23 05:52:23,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3215.074, 3337.5562, 2720.6924, 3421.602, 3399.2756, 3417.4368, 3399.8284, 3414.5383, 3373.5325, 3468.5386]
2026-01-23 05:52:23,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 808.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:52:23,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 15 minutes, 13 seconds)
2026-01-23 05:56:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1575.99316 ± 1368.373
2026-01-23 05:56:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3488.122, 3603.3176, 3445.1267, 1913.6154, 32.825233, 750.9782, 1016.00494, 42.777622, 1053.617, 413.54724]
2026-01-23 05:56:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 577.0, 45.0, 254.0, 328.0, 48.0, 364.0, 170.0]
2026-01-23 05:56:11,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 12 minutes, 3 seconds)
2026-01-23 05:59:35,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:59:49,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3349.31396 ± 25.978
2026-01-23 05:59:49,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3362.6653, 3337.8015, 3306.4688, 3347.9841, 3366.0564, 3398.566, 3327.2036, 3371.6877, 3319.7642, 3354.9443]
2026-01-23 05:59:49,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:59:49,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (3349.31) for latency DatasetOffice
2026-01-23 05:59:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 7 minutes, 6 seconds)
2026-01-23 06:03:23,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:03:34,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2521.20776 ± 1269.463
2026-01-23 06:03:34,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3320.3337, 3359.9448, 284.73743, 736.5606, 3388.7146, 3333.6406, 3386.2559, 752.68994, 3265.2405, 3383.9602]
2026-01-23 06:03:34,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 125.0, 263.0, 1000.0, 1000.0, 1000.0, 266.0, 1000.0, 1000.0]
2026-01-23 06:03:34,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 5 minutes, 29 seconds)
2026-01-23 06:07:07,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:07:20,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3348.88525 ± 182.386
2026-01-23 06:07:20,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3430.2092, 2807.7786, 3440.4272, 3393.5894, 3442.4756, 3389.4277, 3386.2954, 3441.081, 3359.208, 3398.3608]
2026-01-23 06:07:20,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 798.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:07:20,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 2 minutes, 18 seconds)
2026-01-23 06:10:50,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:01,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2645.40112 ± 819.057
2026-01-23 06:11:01,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1047.8109, 3342.0166, 2728.6519, 3306.0461, 1418.5878, 3313.3171, 3307.585, 2526.464, 3367.7148, 2095.8174]
2026-01-23 06:11:01,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [349.0, 1000.0, 822.0, 1000.0, 470.0, 1000.0, 1000.0, 782.0, 1000.0, 645.0]
2026-01-23 06:11:01,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 59 minutes, 15 seconds)
2026-01-23 06:14:23,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:14:34,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2471.60327 ± 1002.938
2026-01-23 06:14:34,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2084.0376, 3267.2563, 3202.9893, 3127.839, 3225.7966, 3358.5151, 1761.1658, 2891.2217, 1712.5931, 84.61849]
2026-01-23 06:14:34,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [668.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 584.0, 1000.0, 552.0, 70.0]
2026-01-23 06:14:34,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 53 minutes, 56 seconds)
2026-01-23 06:18:03,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:18:17,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3264.91577 ± 301.591
2026-01-23 06:18:17,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3396.744, 3379.6943, 3316.7021, 3377.0977, 3415.3142, 3386.9895, 2364.9207, 3344.2974, 3318.2874, 3349.1113]
2026-01-23 06:18:17,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 715.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:18:17,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 50 minutes, 46 seconds)
2026-01-23 06:21:52,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:22:02,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2427.62842 ± 1386.837
2026-01-23 06:22:02,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3291.5144, 3347.412, 3313.0386, 3342.351, 403.8165, 3350.0806, 3326.5278, 400.56635, 131.9352, 3369.0398]
2026-01-23 06:22:02,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 166.0, 1000.0, 1000.0, 168.0, 68.0, 1000.0]
2026-01-23 06:22:02,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 47 minutes, 10 seconds)
2026-01-23 06:25:38,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:25:46,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2026.23474 ± 1576.108
2026-01-23 06:25:46,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3399.1516, 3462.1716, 3415.172, 3380.7537, 3392.4094, 2774.078, 74.518745, 66.70119, 96.23749, 201.15237]
2026-01-23 06:25:46,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 819.0, 41.0, 37.0, 72.0, 93.0]
2026-01-23 06:25:46,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 43 minutes, 12 seconds)
2026-01-23 06:29:14,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:29:23,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2095.81299 ± 1241.654
2026-01-23 06:29:23,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3173.5247, 3257.4414, 777.8385, 3290.663, 3217.9653, 1589.9521, 19.869156, 1888.3994, 497.75195, 3244.726]
2026-01-23 06:29:23,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 286.0, 1000.0, 1000.0, 527.0, 18.0, 601.0, 178.0, 1000.0]
2026-01-23 06:29:23,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 39 minutes, 13 seconds)
2026-01-23 06:32:44,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:32:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2829.70630 ± 940.042
2026-01-23 06:32:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3385.9255, 441.66525, 3402.194, 3406.713, 3353.142, 3414.227, 1954.6676, 3241.3472, 3401.9849, 2295.1946]
2026-01-23 06:32:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 180.0, 1000.0, 1000.0, 1000.0, 1000.0, 607.0, 950.0, 1000.0, 702.0]
2026-01-23 06:32:56,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 35 minutes, 30 seconds)
2026-01-23 06:36:34,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:36:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 876.59064 ± 1333.786
2026-01-23 06:36:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3470.2778, 1417.6229, 55.300194, 15.464734, 120.22724, 208.93109, 41.029705, 41.14028, 26.044159, 3369.8684]
2026-01-23 06:36:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 441.0, 34.0, 16.0, 80.0, 99.0, 24.0, 24.0, 21.0, 1000.0]
2026-01-23 06:36:38,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 31 minutes, 47 seconds)
2026-01-23 06:40:09,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:40:11,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 397.21872 ± 26.253
2026-01-23 06:40:11,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [465.97504, 396.3665, 420.40274, 380.17676, 383.55945, 392.39523, 384.49463, 392.62292, 367.7962, 388.39804]
2026-01-23 06:40:11,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [165.0, 147.0, 146.0, 137.0, 134.0, 145.0, 142.0, 146.0, 134.0, 135.0]
2026-01-23 06:40:11,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 27 minutes, 4 seconds)
2026-01-23 06:43:41,715 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:43:53,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2822.50537 ± 1305.723
2026-01-23 06:43:53,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3473.7593, 3432.4167, 3517.2708, 3490.531, 405.36237, 28.81157, 3416.2683, 3488.1558, 3486.9253, 3485.5532]
2026-01-23 06:43:53,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 154.0, 21.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:43:53,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 23 minutes, 17 seconds)
2026-01-23 06:47:11,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:47:23,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2881.06934 ± 1103.172
2026-01-23 06:47:23,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3453.1292, 3463.7495, 684.493, 3389.6152, 3439.36, 3430.4473, 665.7329, 3449.653, 3412.4817, 3422.0308]
2026-01-23 06:47:23,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 241.0, 1000.0, 1000.0, 1000.0, 219.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:47:23,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 19 minutes, 10 seconds)
2026-01-23 06:51:06,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:51:20,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3444.03442 ± 17.359
2026-01-23 06:51:20,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3450.3687, 3467.3638, 3454.7764, 3451.6533, 3434.9631, 3459.7217, 3450.653, 3409.2192, 3443.4934, 3418.1292]
2026-01-23 06:51:20,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:51:20,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (3444.03) for latency DatasetOffice
2026-01-23 06:51:20,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 17 minutes, 15 seconds)
2026-01-23 06:54:51,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:55:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3426.30200 ± 15.033
2026-01-23 06:55:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3426.445, 3447.2048, 3436.4355, 3443.521, 3427.9558, 3405.0764, 3414.8682, 3399.1484, 3436.4224, 3425.9412]
2026-01-23 06:55:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 990.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:55:04,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 13 minutes, 45 seconds)
2026-01-23 06:58:22,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:58:24,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 420.25870 ± 984.050
2026-01-23 06:58:24,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3359.3486, 344.8713, 28.667824, 42.952244, 79.13573, 160.56516, 78.5785, 16.122284, 37.353497, 54.991707]
2026-01-23 06:58:24,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 150.0, 24.0, 25.0, 47.0, 78.0, 45.0, 16.0, 31.0, 32.0]
2026-01-23 06:58:24,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 9 minutes, 12 seconds)
2026-01-23 07:01:51,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:02:03,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2912.57837 ± 1086.020
2026-01-23 07:02:03,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3461.7393, 3469.2583, 434.9302, 3430.1558, 3464.1082, 3407.0476, 3464.578, 1086.0627, 3471.109, 3436.7937]
2026-01-23 07:02:03,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 168.0, 1000.0, 1000.0, 1000.0, 1000.0, 356.0, 1000.0, 1000.0]
2026-01-23 07:02:03,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 5 minutes, 24 seconds)
2026-01-23 07:05:38,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:05:51,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3296.48950 ± 580.456
2026-01-23 07:05:51,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3488.7925, 3469.0083, 3503.408, 3487.1555, 3488.3242, 3482.5828, 3484.8545, 1555.4839, 3489.042, 3516.2407]
2026-01-23 07:05:51,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 454.0, 1000.0, 1000.0]
2026-01-23 07:05:51,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 2 minutes, 47 seconds)
2026-01-23 07:09:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:09:36,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1618.51770 ± 1571.955
2026-01-23 07:09:36,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3414.9043, 3415.253, 3417.8208, 3393.9714, 2079.2158, 45.21579, 44.439148, 96.19179, 87.8591, 190.30595]
2026-01-23 07:09:36,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 633.0, 36.0, 26.0, 58.0, 59.0, 89.0]
2026-01-23 07:09:36,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 58 minutes, 26 seconds)
2026-01-23 07:13:02,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:13:16,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3429.73706 ± 63.546
2026-01-23 07:13:16,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3467.9832, 3447.1558, 3434.623, 3464.0344, 3241.295, 3447.4297, 3447.5793, 3439.1482, 3453.3757, 3454.7468]
2026-01-23 07:13:16,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 939.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:13:16,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 54 minutes, 33 seconds)
2026-01-23 07:16:44,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:16:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3026.73975 ± 896.645
2026-01-23 07:16:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3412.115, 3432.1091, 3435.6636, 500.75278, 3355.4468, 3424.6648, 3425.535, 3439.162, 3450.6746, 2391.2732]
2026-01-23 07:16:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 187.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 710.0]
2026-01-23 07:16:57,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 51 minutes, 56 seconds)
2026-01-23 07:20:30,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:20:38,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2034.93420 ± 1176.047
2026-01-23 07:20:38,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3376.8533, 2784.4824, 3409.0532, 2340.3826, 2056.1743, 26.984076, 1030.0825, 441.16965, 3328.9658, 1555.1935]
2026-01-23 07:20:38,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 827.0, 1000.0, 715.0, 644.0, 30.0, 362.0, 182.0, 1000.0, 457.0]
2026-01-23 07:20:39,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 48 minutes, 20 seconds)
2026-01-23 07:23:58,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:24:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3405.97192 ± 119.671
2026-01-23 07:24:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3431.9993, 3451.1814, 3436.0122, 3435.1814, 3446.3906, 3465.0667, 3452.297, 3456.7056, 3048.2705, 3436.613]
2026-01-23 07:24:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 890.0, 1000.0]
2026-01-23 07:24:12,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 44 minutes, 3 seconds)
2026-01-23 07:27:51,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:27:58,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1575.61926 ± 1327.025
2026-01-23 07:27:58,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [190.29376, 3440.5847, 3406.9124, 730.7909, 2123.9072, 3427.3545, 1015.6395, 194.9328, 1026.5511, 199.22557]
2026-01-23 07:27:58,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [91.0, 1000.0, 1000.0, 257.0, 632.0, 993.0, 329.0, 91.0, 339.0, 95.0]
2026-01-23 07:27:58,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 40 minutes, 24 seconds)
2026-01-23 07:31:23,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:31:31,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1814.86914 ± 1473.916
2026-01-23 07:31:31,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1706.7874, 149.19057, 49.880325, 182.71808, 94.30868, 3329.6902, 3403.5435, 2447.266, 3419.0203, 3366.2864]
2026-01-23 07:31:31,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [523.0, 90.0, 44.0, 91.0, 65.0, 1000.0, 1000.0, 729.0, 1000.0, 1000.0]
2026-01-23 07:31:31,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 36 minutes, 29 seconds)
2026-01-23 07:35:00,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:35:13,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3164.42505 ± 735.601
2026-01-23 07:35:13,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3423.0269, 959.9817, 3367.7334, 3425.0022, 3421.5347, 3453.3633, 3418.8486, 3404.7688, 3439.8582, 3330.1328]
2026-01-23 07:35:13,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 316.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:35:13,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 32 minutes, 53 seconds)
2026-01-23 07:38:38,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:38:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3105.33838 ± 725.591
2026-01-23 07:38:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3437.592, 3444.2292, 3448.583, 3435.3284, 2361.026, 3400.9187, 1155.8241, 3445.5662, 3477.9844, 3446.3333]
2026-01-23 07:38:51,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 704.0, 1000.0, 359.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:38:51,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 29 minutes, 8 seconds)
2026-01-23 07:42:21,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:42:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2245.02246 ± 1471.106
2026-01-23 07:42:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [35.759575, 1015.37085, 433.12842, 372.5114, 3393.9685, 3432.9746, 3430.8977, 3461.6926, 3442.0076, 3431.9133]
2026-01-23 07:42:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [41.0, 349.0, 175.0, 154.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:42:31,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 25 minutes, 37 seconds)
2026-01-23 07:46:10,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:46:22,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2827.78003 ± 1230.101
2026-01-23 07:46:22,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3450.0295, 3427.1287, 3439.2446, 727.5563, 45.72411, 3412.349, 3437.2798, 3443.6863, 3447.4897, 3447.312]
2026-01-23 07:46:22,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 248.0, 27.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:46:22,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 5 seconds)
2026-01-23 07:49:52,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:50:05,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3011.11182 ± 939.330
2026-01-23 07:50:05,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2230.2769, 3383.9956, 3430.2834, 3430.8743, 407.57492, 3453.7532, 3457.943, 3440.6895, 3437.6714, 3438.056]
2026-01-23 07:50:05,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [666.0, 1000.0, 1000.0, 1000.0, 155.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:50:05,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 18 minutes, 34 seconds)
2026-01-23 07:53:35,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:53:47,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2923.47681 ± 936.386
2026-01-23 07:53:47,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3427.1592, 3411.8733, 3377.0898, 3087.5872, 3398.4285, 572.89813, 3406.2563, 3442.751, 1683.9495, 3426.775]
2026-01-23 07:53:47,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 903.0, 1000.0, 206.0, 1000.0, 1000.0, 522.0, 1000.0]
2026-01-23 07:53:47,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 51 seconds)
2026-01-23 07:57:17,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:57:29,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2852.18213 ± 1176.003
2026-01-23 07:57:29,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3421.7185, 3448.1892, 3336.9539, 3442.388, 3448.5647, 3468.5725, 3476.2874, 260.1463, 764.1419, 3454.8572]
2026-01-23 07:57:29,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 113.0, 262.0, 1000.0]
2026-01-23 07:57:29,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 10 seconds)
2026-01-23 08:00:51,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:00:55,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 942.70361 ± 1351.740
2026-01-23 08:00:55,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3457.8235, 163.75322, 3446.0518, 1796.0663, 47.905746, 48.76885, 132.2579, 209.88979, 47.380936, 77.13728]
2026-01-23 08:00:55,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 78.0, 1000.0, 551.0, 39.0, 29.0, 69.0, 98.0, 28.0, 44.0]
2026-01-23 08:00:55,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 21 seconds)
2026-01-23 08:04:44,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:04:58,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 3311.37891 ± 438.178
2026-01-23 08:04:58,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3474.766, 3442.2886, 3454.5884, 3454.319, 3470.8362, 1997.5555, 3427.5388, 3477.2148, 3462.8423, 3451.8389]
2026-01-23 08:04:58,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 586.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:04:58,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 43 seconds)
2026-01-23 08:08:12,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:08:21,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2158.10449 ± 1420.621
2026-01-23 08:08:21,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3526.662, 459.93616, 3487.8174, 3494.7825, 3477.261, 799.7108, 231.21913, 3476.452, 2152.4592, 474.74515]
2026-01-23 08:08:21,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 174.0, 1000.0, 1000.0, 1000.0, 270.0, 103.0, 1000.0, 632.0, 180.0]
2026-01-23 08:08:21,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1299 [DEBUG]: Training session finished
