2025-05-13 09:06:33,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mda-mem32
2025-05-13 09:06:33,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mda-mem32
2025-05-13 09:06:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15303e536d10>}
2025-05-13 09:06:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:33,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-13 09:06:33,374 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-13 09:06:33,374 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:33,379 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2025-05-13 09:06:34,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:34,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-13 09:09:59,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:10:00,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 30.28182 ± 3.523
2025-05-13 09:10:00,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [28.134254, 32.903095, 27.338253, 28.068712, 30.356348, 26.85779, 31.966724, 38.63306, 31.801678, 26.758286]
2025-05-13 09:10:00,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 29.0, 52.0, 53.0, 29.0, 33.0, 56.0, 54.0, 51.0]
2025-05-13 09:10:00,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (30.28) for latency ExtremeSparseL4U32
2025-05-13 09:10:00,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 40 minutes)
2025-05-13 09:13:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:13:31,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 126.17909 ± 61.944
2025-05-13 09:13:31,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [92.64907, 195.55084, 27.055214, 120.754036, 135.1949, 158.14021, 234.3631, 143.55322, 125.664215, 28.866037]
2025-05-13 09:13:31,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 168.0, 33.0, 132.0, 105.0, 162.0, 255.0, 124.0, 97.0, 38.0]
2025-05-13 09:13:31,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (126.18) for latency ExtremeSparseL4U32
2025-05-13 09:13:31,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 40 minutes, 56 seconds)
2025-05-13 09:17:05,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:17:07,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 109.92891 ± 48.766
2025-05-13 09:17:07,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [146.44824, 151.3824, 151.25223, 22.495888, 130.57558, 118.06198, 157.74518, 87.97275, 19.90945, 113.445435]
2025-05-13 09:17:07,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 120.0, 124.0, 28.0, 105.0, 89.0, 129.0, 75.0, 23.0, 85.0]
2025-05-13 09:17:07,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 41 minutes, 25 seconds)
2025-05-13 09:20:36,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:20:38,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 99.60777 ± 40.098
2025-05-13 09:20:38,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [102.654465, 131.11966, 29.276833, 115.728966, 159.8639, 28.723042, 98.338844, 92.38429, 133.28992, 104.69781]
2025-05-13 09:20:38,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 111.0, 32.0, 102.0, 128.0, 33.0, 78.0, 86.0, 137.0, 83.0]
2025-05-13 09:20:38,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 37 minutes, 38 seconds)
2025-05-13 09:24:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:24:13,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 133.34341 ± 68.513
2025-05-13 09:24:13,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [92.36017, 164.8936, 228.76587, 197.05081, 78.61335, 63.809402, 84.83114, 23.67011, 193.48828, 205.95143]
2025-05-13 09:24:13,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 100.0, 127.0, 121.0, 79.0, 76.0, 70.0, 27.0, 151.0, 128.0]
2025-05-13 09:24:13,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (133.34) for latency ExtremeSparseL4U32
2025-05-13 09:24:13,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 35 minutes, 31 seconds)
2025-05-13 09:27:43,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:27:44,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 83.62891 ± 23.146
2025-05-13 09:27:44,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [86.76042, 68.04719, 111.98843, 78.42662, 103.36977, 110.43864, 68.16317, 97.2147, 80.51583, 31.364391]
2025-05-13 09:27:44,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 61.0, 87.0, 62.0, 82.0, 74.0, 60.0, 71.0, 64.0, 32.0]
2025-05-13 09:27:44,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 33 minutes, 34 seconds)
2025-05-13 09:31:15,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:31:17,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 144.92020 ± 58.570
2025-05-13 09:31:17,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [119.36826, 124.39594, 236.84203, 29.802387, 123.04095, 122.43454, 124.202065, 238.69519, 149.43509, 180.98557]
2025-05-13 09:31:17,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 91.0, 120.0, 32.0, 82.0, 88.0, 81.0, 127.0, 96.0, 109.0]
2025-05-13 09:31:17,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (144.92) for latency ExtremeSparseL4U32
2025-05-13 09:31:17,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 30 minutes, 23 seconds)
2025-05-13 09:34:47,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:34:49,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 79.25948 ± 23.314
2025-05-13 09:34:49,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [120.94261, 69.38203, 67.267265, 90.90312, 79.87797, 100.703865, 86.18196, 71.80688, 26.751036, 78.77804]
2025-05-13 09:34:49,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 55.0, 49.0, 74.0, 63.0, 68.0, 65.0, 51.0, 29.0, 63.0]
2025-05-13 09:34:49,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 25 minutes, 33 seconds)
2025-05-13 09:38:21,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:38:22,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 81.84697 ± 33.209
2025-05-13 09:38:22,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [100.18205, 142.4098, 76.319565, 95.45408, 96.569786, 111.83202, 28.653923, 67.28691, 67.30865, 32.45296]
2025-05-13 09:38:22,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 94.0, 62.0, 69.0, 72.0, 82.0, 30.0, 56.0, 56.0, 32.0]
2025-05-13 09:38:22,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 22 minutes, 52 seconds)
2025-05-13 09:41:51,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:41:53,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 121.59955 ± 34.205
2025-05-13 09:41:53,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [121.66942, 173.89528, 63.323162, 139.46156, 79.45612, 124.49476, 133.29593, 84.53948, 131.06851, 164.79126]
2025-05-13 09:41:53,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 112.0, 56.0, 93.0, 60.0, 85.0, 88.0, 68.0, 86.0, 96.0]
2025-05-13 09:41:53,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 18 minutes)
2025-05-13 09:45:25,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:45:27,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 132.55167 ± 57.857
2025-05-13 09:45:27,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [28.462654, 207.5722, 127.52435, 135.67474, 176.64188, 144.47823, 187.52371, 158.17657, 26.666744, 132.79546]
2025-05-13 09:45:27,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 109.0, 85.0, 95.0, 110.0, 96.0, 112.0, 92.0, 31.0, 95.0]
2025-05-13 09:45:27,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 15 minutes, 11 seconds)
2025-05-13 09:48:59,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:49:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 118.12461 ± 13.815
2025-05-13 09:49:01,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [126.62277, 97.90208, 143.5863, 95.9468, 132.8987, 120.14653, 118.5737, 120.27271, 112.560326, 112.73616]
2025-05-13 09:49:01,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 107.0, 95.0, 73.0, 91.0, 105.0, 114.0, 119.0, 103.0, 88.0]
2025-05-13 09:49:01,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 12 minutes, 9 seconds)
2025-05-13 09:52:32,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:52:34,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 101.73804 ± 60.256
2025-05-13 09:52:34,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [181.4845, 215.50406, 26.234728, 118.83364, 98.534515, 86.29144, 131.78758, 33.31811, 95.335, 30.056784]
2025-05-13 09:52:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 116.0, 29.0, 81.0, 81.0, 69.0, 96.0, 32.0, 74.0, 32.0]
2025-05-13 09:52:34,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 8 minutes, 51 seconds)
2025-05-13 09:56:05,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:56:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 159.35881 ± 46.074
2025-05-13 09:56:07,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [137.87646, 96.76948, 148.95396, 140.3736, 132.32938, 110.59758, 245.13155, 230.19637, 189.17647, 162.1832]
2025-05-13 09:56:07,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 66.0, 90.0, 95.0, 89.0, 77.0, 128.0, 123.0, 110.0, 98.0]
2025-05-13 09:56:07,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (159.36) for latency ExtremeSparseL4U32
2025-05-13 09:56:07,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 5 minutes, 11 seconds)
2025-05-13 09:59:38,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:59:40,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 133.65117 ± 46.273
2025-05-13 09:59:40,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [121.28183, 110.6149, 98.06155, 123.29902, 152.95044, 185.7636, 30.437952, 200.98178, 160.45563, 152.66504]
2025-05-13 09:59:40,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 75.0, 69.0, 83.0, 95.0, 105.0, 31.0, 118.0, 103.0, 101.0]
2025-05-13 09:59:40,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 2 minutes, 14 seconds)
2025-05-13 10:03:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:03:15,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 170.30298 ± 35.948
2025-05-13 10:03:15,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [186.7224, 137.64354, 193.76808, 148.5269, 201.30394, 88.639755, 189.41734, 151.05957, 208.50546, 197.44281]
2025-05-13 10:03:15,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 93.0, 117.0, 100.0, 111.0, 65.0, 112.0, 101.0, 120.0, 107.0]
2025-05-13 10:03:15,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (170.30) for latency ExtremeSparseL4U32
2025-05-13 10:03:15,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 59 minutes, 2 seconds)
2025-05-13 10:06:45,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:06:47,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 136.32394 ± 65.564
2025-05-13 10:06:47,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [29.099297, 225.4932, 174.04941, 158.49245, 215.26022, 163.63214, 128.80135, 20.374126, 144.55452, 103.48256]
2025-05-13 10:06:47,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 138.0, 117.0, 97.0, 121.0, 97.0, 83.0, 25.0, 86.0, 81.0]
2025-05-13 10:06:47,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 54 minutes, 57 seconds)
2025-05-13 10:10:21,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:10:23,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 121.52785 ± 37.861
2025-05-13 10:10:23,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [136.87053, 147.90361, 111.2063, 28.752756, 177.41211, 124.37822, 130.52168, 147.02878, 92.95803, 118.24649]
2025-05-13 10:10:23,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 109.0, 87.0, 31.0, 110.0, 95.0, 96.0, 105.0, 77.0, 90.0]
2025-05-13 10:10:23,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 52 minutes, 10 seconds)
2025-05-13 10:13:53,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:13:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 154.31126 ± 37.540
2025-05-13 10:13:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [141.47963, 247.59566, 185.5789, 143.35606, 109.87651, 121.76375, 169.24586, 136.21239, 131.13615, 156.86777]
2025-05-13 10:13:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 130.0, 116.0, 96.0, 95.0, 90.0, 110.0, 90.0, 93.0, 103.0]
2025-05-13 10:13:55,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 48 minutes, 23 seconds)
2025-05-13 10:17:27,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:17:29,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 142.90842 ± 53.845
2025-05-13 10:17:29,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [167.6768, 133.26108, 145.27188, 161.8141, 233.37025, 200.97119, 103.72723, 104.92988, 150.6167, 27.44504]
2025-05-13 10:17:29,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 84.0, 96.0, 98.0, 121.0, 119.0, 74.0, 70.0, 95.0, 29.0]
2025-05-13 10:17:29,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 45 minutes, 8 seconds)
2025-05-13 10:21:00,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:21:02,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 179.39830 ± 63.667
2025-05-13 10:21:02,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [180.79707, 252.27327, 157.97598, 129.44846, 136.38322, 325.14853, 171.42015, 209.04326, 123.76754, 107.725586]
2025-05-13 10:21:02,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 133.0, 98.0, 84.0, 85.0, 146.0, 106.0, 126.0, 82.0, 82.0]
2025-05-13 10:21:02,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (179.40) for latency ExtremeSparseL4U32
2025-05-13 10:21:02,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 41 minutes, 9 seconds)
2025-05-13 10:24:34,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:24:36,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 149.11397 ± 26.291
2025-05-13 10:24:36,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [120.337364, 135.33012, 201.28911, 195.88309, 144.23727, 136.30846, 131.62373, 127.69063, 153.20946, 145.23048]
2025-05-13 10:24:36,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 87.0, 110.0, 115.0, 108.0, 88.0, 79.0, 82.0, 119.0, 97.0]
2025-05-13 10:24:36,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 37 minutes, 48 seconds)
2025-05-13 10:28:05,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:28:08,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 142.42580 ± 48.738
2025-05-13 10:28:08,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [124.806114, 114.44106, 143.79198, 71.73561, 158.70146, 128.15448, 110.27162, 267.68893, 140.67584, 163.99084]
2025-05-13 10:28:08,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 88.0, 100.0, 65.0, 109.0, 90.0, 85.0, 144.0, 93.0, 99.0]
2025-05-13 10:28:08,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 33 minutes, 19 seconds)
2025-05-13 10:31:36,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:31:38,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 144.58705 ± 48.621
2025-05-13 10:31:38,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [141.8312, 125.33679, 124.20568, 133.28888, 108.618324, 145.37346, 92.683205, 278.88315, 166.17795, 129.47176]
2025-05-13 10:31:38,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 80.0, 100.0, 87.0, 78.0, 88.0, 66.0, 132.0, 97.0, 93.0]
2025-05-13 10:31:38,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 29 minutes, 25 seconds)
2025-05-13 10:35:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:35:12,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 170.35089 ± 48.379
2025-05-13 10:35:12,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [101.43564, 174.0698, 171.08525, 128.52788, 144.4782, 231.35541, 183.21286, 275.19766, 135.67523, 158.4711]
2025-05-13 10:35:12,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 100.0, 98.0, 84.0, 87.0, 120.0, 116.0, 134.0, 88.0, 101.0]
2025-05-13 10:35:12,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 25 minutes, 38 seconds)
2025-05-13 10:38:40,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:38:42,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 137.61533 ± 31.275
2025-05-13 10:38:42,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [173.12173, 106.195045, 90.483284, 140.79625, 142.38197, 164.7531, 188.87827, 148.25407, 123.8117, 97.47795]
2025-05-13 10:38:42,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 85.0, 67.0, 82.0, 93.0, 99.0, 103.0, 101.0, 78.0, 70.0]
2025-05-13 10:38:42,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 21 minutes, 24 seconds)
2025-05-13 10:42:12,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:42:14,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 130.00534 ± 45.550
2025-05-13 10:42:14,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [128.84293, 204.28217, 125.43451, 173.41084, 167.36986, 107.16187, 124.68309, 141.24599, 28.538357, 99.083725]
2025-05-13 10:42:14,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 106.0, 84.0, 97.0, 107.0, 75.0, 85.0, 86.0, 30.0, 77.0]
2025-05-13 10:42:14,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 17 minutes, 32 seconds)
2025-05-13 10:45:42,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:45:43,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 122.50200 ± 60.740
2025-05-13 10:45:43,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [121.90891, 21.035437, 121.9042, 92.80923, 95.33808, 175.79088, 191.29094, 210.31097, 162.2283, 32.403046]
2025-05-13 10:45:43,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 25.0, 84.0, 71.0, 74.0, 105.0, 110.0, 115.0, 90.0, 31.0]
2025-05-13 10:45:43,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 13 minutes, 23 seconds)
2025-05-13 10:49:12,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:49:14,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 129.30029 ± 61.544
2025-05-13 10:49:14,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [132.19934, 164.0941, 109.63228, 134.60202, 104.61725, 190.03926, 216.25754, 27.073643, 26.74689, 187.74065]
2025-05-13 10:49:14,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 96.0, 72.0, 96.0, 74.0, 108.0, 111.0, 29.0, 30.0, 109.0]
2025-05-13 10:49:14,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 9 minutes, 51 seconds)
2025-05-13 10:52:44,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:52:45,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 142.93100 ± 57.323
2025-05-13 10:52:45,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [133.50272, 229.24222, 22.85175, 155.04596, 124.59932, 195.9793, 187.53365, 93.20339, 183.30846, 104.04321]
2025-05-13 10:52:45,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 116.0, 29.0, 91.0, 86.0, 113.0, 101.0, 66.0, 113.0, 77.0]
2025-05-13 10:52:45,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 5 minutes, 51 seconds)
2025-05-13 10:56:15,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:56:17,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 127.17433 ± 56.443
2025-05-13 10:56:17,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [30.809423, 130.7826, 179.65187, 206.39468, 164.54414, 144.88852, 29.033148, 115.37619, 104.13338, 166.12923]
2025-05-13 10:56:17,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 91.0, 108.0, 112.0, 111.0, 110.0, 30.0, 90.0, 71.0, 105.0]
2025-05-13 10:56:17,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 2 minutes, 36 seconds)
2025-05-13 10:59:45,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:59:47,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 166.16940 ± 68.772
2025-05-13 10:59:47,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [119.23439, 168.38577, 344.4812, 203.9388, 91.9951, 119.11526, 133.83784, 139.1739, 136.59882, 204.93298]
2025-05-13 10:59:47,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 108.0, 151.0, 117.0, 79.0, 100.0, 94.0, 82.0, 99.0, 114.0]
2025-05-13 10:59:47,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 58 minutes, 45 seconds)
2025-05-13 11:03:16,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:03:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 150.16705 ± 83.923
2025-05-13 11:03:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [28.193022, 184.86986, 178.51051, 102.38089, 214.98056, 98.53154, 306.81833, 28.814182, 220.71986, 137.85185]
2025-05-13 11:03:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 99.0, 100.0, 80.0, 113.0, 70.0, 133.0, 32.0, 122.0, 88.0]
2025-05-13 11:03:17,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 55 minutes, 23 seconds)
2025-05-13 11:06:46,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:06:48,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 133.07089 ± 18.699
2025-05-13 11:06:48,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [151.59592, 165.55643, 120.61586, 118.08438, 127.74123, 145.06676, 144.57523, 96.905464, 136.99266, 123.57488]
2025-05-13 11:06:48,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 94.0, 81.0, 81.0, 86.0, 89.0, 93.0, 67.0, 83.0, 83.0]
2025-05-13 11:06:48,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 51 minutes, 46 seconds)
2025-05-13 11:10:18,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:10:20,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 156.36948 ± 86.335
2025-05-13 11:10:20,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [27.147612, 163.84724, 92.18952, 338.93158, 135.42918, 132.6267, 279.1295, 110.724686, 112.20136, 171.46745]
2025-05-13 11:10:20,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 98.0, 68.0, 151.0, 82.0, 90.0, 144.0, 76.0, 75.0, 92.0]
2025-05-13 11:10:20,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 48 minutes, 30 seconds)
2025-05-13 11:13:48,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:13:50,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 129.22755 ± 69.558
2025-05-13 11:13:50,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [144.85362, 23.79306, 26.334768, 122.80756, 156.46921, 151.60713, 191.7084, 115.38964, 90.29687, 269.01523]
2025-05-13 11:13:50,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 26.0, 30.0, 85.0, 101.0, 102.0, 108.0, 83.0, 62.0, 126.0]
2025-05-13 11:13:50,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 44 minutes, 33 seconds)
2025-05-13 11:17:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:17:21,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 126.49840 ± 79.891
2025-05-13 11:17:21,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [26.78304, 156.3833, 25.68738, 123.96876, 187.39906, 172.68608, 286.27347, 28.050161, 158.42828, 99.32433]
2025-05-13 11:17:21,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 93.0, 28.0, 86.0, 97.0, 105.0, 136.0, 32.0, 95.0, 77.0]
2025-05-13 11:17:21,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 41 minutes, 9 seconds)
2025-05-13 11:20:50,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:20:52,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 143.13322 ± 79.756
2025-05-13 11:20:52,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [126.73926, 107.82647, 117.781296, 202.89568, 21.883974, 100.97853, 92.5461, 191.99623, 333.88623, 134.79858]
2025-05-13 11:20:52,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 78.0, 80.0, 113.0, 25.0, 73.0, 72.0, 111.0, 163.0, 89.0]
2025-05-13 11:20:52,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 37 minutes, 53 seconds)
2025-05-13 11:24:20,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:24:22,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 146.25131 ± 45.810
2025-05-13 11:24:22,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [133.7207, 200.42665, 139.55971, 100.0104, 168.16331, 119.3783, 98.990974, 135.61424, 251.92873, 114.7202]
2025-05-13 11:24:22,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 111.0, 90.0, 71.0, 109.0, 76.0, 69.0, 91.0, 129.0, 83.0]
2025-05-13 11:24:22,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 34 minutes, 20 seconds)
2025-05-13 11:27:51,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:27:53,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 166.08534 ± 46.977
2025-05-13 11:27:53,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [134.73817, 215.87888, 269.25546, 205.09465, 117.416016, 131.71284, 163.31259, 121.94004, 165.9793, 135.52547]
2025-05-13 11:27:53,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 111.0, 134.0, 109.0, 81.0, 86.0, 95.0, 80.0, 97.0, 87.0]
2025-05-13 11:27:53,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 30 minutes, 35 seconds)
2025-05-13 11:31:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:31:24,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 140.48882 ± 79.507
2025-05-13 11:31:24,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [100.54211, 289.14545, 24.287874, 99.5314, 225.48624, 147.62366, 23.593948, 134.64586, 169.05507, 190.97652]
2025-05-13 11:31:24,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 133.0, 29.0, 73.0, 120.0, 92.0, 29.0, 91.0, 101.0, 102.0]
2025-05-13 11:31:24,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 27 minutes, 16 seconds)
2025-05-13 11:34:52,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:34:55,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 197.10031 ± 45.561
2025-05-13 11:34:55,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [210.19376, 117.36003, 206.40355, 225.93958, 169.06508, 180.08429, 153.8451, 188.85097, 224.4524, 294.80838]
2025-05-13 11:34:55,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 86.0, 116.0, 125.0, 99.0, 103.0, 92.0, 109.0, 118.0, 142.0]
2025-05-13 11:34:55,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (197.10) for latency ExtremeSparseL4U32
2025-05-13 11:34:55,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 23 minutes, 46 seconds)
2025-05-13 11:38:23,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:38:25,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 148.57994 ± 91.946
2025-05-13 11:38:25,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [197.36978, 108.78765, 143.1995, 368.21426, 28.68961, 132.25093, 120.29123, 181.37021, 178.98053, 26.645864]
2025-05-13 11:38:25,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 81.0, 105.0, 174.0, 31.0, 84.0, 85.0, 102.0, 112.0, 29.0]
2025-05-13 11:38:25,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 20 minutes, 7 seconds)
2025-05-13 11:41:55,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:41:57,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 159.07243 ± 61.415
2025-05-13 11:41:57,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [110.494156, 204.79576, 103.66629, 140.24559, 128.03566, 107.907326, 173.66393, 134.91487, 168.63696, 318.36374]
2025-05-13 11:41:57,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 116.0, 76.0, 90.0, 81.0, 78.0, 105.0, 95.0, 106.0, 143.0]
2025-05-13 11:41:57,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 16 minutes, 56 seconds)
2025-05-13 11:45:26,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:45:28,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 145.99596 ± 50.724
2025-05-13 11:45:28,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [277.40964, 176.41292, 116.72913, 151.45277, 104.11046, 114.299736, 133.9335, 120.914764, 95.32277, 169.37393]
2025-05-13 11:45:28,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 103.0, 77.0, 103.0, 79.0, 84.0, 93.0, 90.0, 70.0, 98.0]
2025-05-13 11:45:28,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 13 minutes, 27 seconds)
2025-05-13 11:48:58,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:49:00,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 167.52582 ± 57.263
2025-05-13 11:49:00,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [209.85031, 114.896545, 222.56798, 105.90118, 112.86572, 201.74986, 108.76857, 260.12158, 118.22808, 220.30852]
2025-05-13 11:49:00,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 84.0, 124.0, 75.0, 81.0, 112.0, 71.0, 129.0, 77.0, 116.0]
2025-05-13 11:49:00,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 10 minutes, 9 seconds)
2025-05-13 11:52:28,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:52:31,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 198.97305 ± 93.978
2025-05-13 11:52:31,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [149.9015, 182.45966, 248.24982, 174.72879, 369.59174, 175.08632, 107.22584, 22.434006, 290.47256, 269.58038]
2025-05-13 11:52:31,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 108.0, 138.0, 107.0, 163.0, 94.0, 89.0, 26.0, 139.0, 133.0]
2025-05-13 11:52:31,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (198.97) for latency ExtremeSparseL4U32
2025-05-13 11:52:31,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 6 minutes, 33 seconds)
2025-05-13 11:56:01,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:56:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 136.92274 ± 51.475
2025-05-13 11:56:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [212.56273, 128.49026, 116.26401, 154.45917, 108.8383, 112.25037, 135.57748, 200.26663, 177.29697, 23.221413]
2025-05-13 11:56:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 91.0, 80.0, 90.0, 79.0, 76.0, 87.0, 102.0, 103.0, 31.0]
2025-05-13 11:56:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 3 minutes, 24 seconds)
2025-05-13 11:59:33,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:59:35,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 191.71661 ± 47.834
2025-05-13 11:59:35,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [132.30647, 222.39998, 132.54025, 197.65405, 157.45561, 228.49806, 299.90326, 167.35384, 193.92378, 185.1307]
2025-05-13 11:59:35,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 120.0, 84.0, 114.0, 103.0, 114.0, 137.0, 116.0, 111.0, 102.0]
2025-05-13 11:59:35,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 59 minutes, 55 seconds)
2025-05-13 12:03:03,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:03:05,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 193.58902 ± 60.777
2025-05-13 12:03:05,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [148.44005, 154.43585, 246.30556, 150.81703, 344.54166, 159.27327, 171.70514, 184.73708, 233.47836, 142.15617]
2025-05-13 12:03:05,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 94.0, 123.0, 96.0, 156.0, 89.0, 104.0, 107.0, 119.0, 86.0]
2025-05-13 12:03:05,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 56 minutes, 9 seconds)
2025-05-13 12:06:35,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:06:37,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 173.55708 ± 59.840
2025-05-13 12:06:37,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [215.80455, 191.00131, 31.174349, 148.26523, 265.7933, 184.68057, 150.24146, 208.99683, 135.54446, 204.06883]
2025-05-13 12:06:37,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 102.0, 32.0, 92.0, 131.0, 104.0, 96.0, 117.0, 89.0, 112.0]
2025-05-13 12:06:37,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 52 minutes, 40 seconds)
2025-05-13 12:10:06,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:10:08,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 192.06400 ± 75.073
2025-05-13 12:10:08,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [142.04668, 130.11073, 266.5758, 161.81105, 242.47202, 195.5688, 301.27734, 200.33191, 248.91711, 31.528454]
2025-05-13 12:10:08,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 95.0, 131.0, 97.0, 121.0, 118.0, 146.0, 113.0, 126.0, 30.0]
2025-05-13 12:10:08,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 49 minutes, 11 seconds)
2025-05-13 12:13:38,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:13:40,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 163.64151 ± 60.551
2025-05-13 12:13:40,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [185.89062, 194.2468, 196.30798, 240.04182, 154.1806, 200.5531, 209.81627, 117.56207, 20.572536, 117.243286]
2025-05-13 12:13:40,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 109.0, 108.0, 117.0, 87.0, 113.0, 116.0, 80.0, 23.0, 78.0]
2025-05-13 12:13:40,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 45 minutes, 37 seconds)
2025-05-13 12:17:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:17:10,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 161.07397 ± 67.658
2025-05-13 12:17:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [122.60197, 243.2123, 224.60008, 29.962349, 173.63058, 141.85913, 96.56209, 135.92291, 264.6275, 177.76077]
2025-05-13 12:17:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 135.0, 117.0, 31.0, 108.0, 85.0, 74.0, 92.0, 131.0, 109.0]
2025-05-13 12:17:10,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 41 minutes, 46 seconds)
2025-05-13 12:20:40,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:20:42,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 150.81116 ± 30.227
2025-05-13 12:20:42,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [137.28757, 191.03741, 131.39389, 134.42668, 200.17888, 168.39926, 114.10302, 154.69121, 171.97197, 104.621826]
2025-05-13 12:20:42,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 122.0, 95.0, 88.0, 106.0, 105.0, 85.0, 94.0, 105.0, 75.0]
2025-05-13 12:20:42,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 38 minutes, 35 seconds)
2025-05-13 12:24:11,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:24:13,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 114.31712 ± 82.668
2025-05-13 12:24:13,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [21.806532, 275.1684, 166.2499, 127.41883, 171.48727, 127.04417, 26.286312, 28.594557, 21.852526, 177.26268]
2025-05-13 12:24:13,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 124.0, 93.0, 85.0, 104.0, 90.0, 29.0, 30.0, 29.0, 105.0]
2025-05-13 12:24:13,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 34 minutes, 51 seconds)
2025-05-13 12:27:42,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:27:44,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 166.45152 ± 53.779
2025-05-13 12:27:44,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [223.20346, 143.85158, 94.72909, 171.18369, 144.14862, 175.66917, 107.54393, 152.84811, 159.43837, 291.89917]
2025-05-13 12:27:44,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 82.0, 68.0, 99.0, 89.0, 104.0, 81.0, 93.0, 94.0, 136.0]
2025-05-13 12:27:44,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 31 minutes, 19 seconds)
2025-05-13 12:31:13,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:31:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 213.99423 ± 105.175
2025-05-13 12:31:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [209.23825, 116.22416, 158.2385, 438.47012, 284.66617, 221.09752, 210.50566, 178.12106, 295.33743, 28.043348]
2025-05-13 12:31:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 78.0, 104.0, 178.0, 140.0, 128.0, 113.0, 103.0, 139.0, 31.0]
2025-05-13 12:31:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (213.99) for latency ExtremeSparseL4U32
2025-05-13 12:31:16,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 27 minutes, 45 seconds)
2025-05-13 12:34:47,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:34:49,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 151.89249 ± 75.196
2025-05-13 12:34:49,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [264.24472, 24.728392, 114.55657, 160.60626, 201.78807, 24.776682, 177.44772, 221.4069, 194.85782, 134.51167]
2025-05-13 12:34:49,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 30.0, 88.0, 88.0, 108.0, 28.0, 110.0, 108.0, 107.0, 87.0]
2025-05-13 12:34:49,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 24 minutes, 37 seconds)
2025-05-13 12:38:18,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:38:20,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 224.34492 ± 90.117
2025-05-13 12:38:20,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [299.28787, 312.00052, 151.19048, 174.93385, 396.68906, 198.60663, 158.4509, 299.89786, 111.438934, 140.953]
2025-05-13 12:38:20,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 145.0, 86.0, 109.0, 184.0, 108.0, 99.0, 138.0, 71.0, 89.0]
2025-05-13 12:38:20,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (224.34) for latency ExtremeSparseL4U32
2025-05-13 12:38:20,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 21 minutes, 3 seconds)
2025-05-13 12:41:49,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:41:51,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 159.14780 ± 77.368
2025-05-13 12:41:51,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [292.99557, 22.628908, 28.543625, 148.2676, 173.07877, 188.25119, 211.6589, 149.74312, 171.42946, 204.88087]
2025-05-13 12:41:51,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 26.0, 30.0, 93.0, 95.0, 109.0, 110.0, 88.0, 109.0, 117.0]
2025-05-13 12:41:51,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 17 minutes, 31 seconds)
2025-05-13 12:45:20,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:45:22,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 197.99214 ± 118.670
2025-05-13 12:45:22,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [400.144, 237.36943, 419.3499, 134.97055, 144.0248, 218.94946, 115.95403, 29.324097, 130.93246, 148.90271]
2025-05-13 12:45:22,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 120.0, 178.0, 93.0, 87.0, 120.0, 78.0, 31.0, 89.0, 92.0]
2025-05-13 12:45:22,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 14 minutes, 4 seconds)
2025-05-13 12:48:52,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:48:54,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 221.50505 ± 82.607
2025-05-13 12:48:55,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [144.26335, 346.97095, 107.58395, 213.68486, 245.74403, 293.18622, 126.84274, 277.74203, 317.59164, 141.44102]
2025-05-13 12:48:55,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 147.0, 70.0, 123.0, 124.0, 131.0, 91.0, 132.0, 147.0, 93.0]
2025-05-13 12:48:55,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 10 minutes, 36 seconds)
2025-05-13 12:52:24,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:52:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 191.10103 ± 57.677
2025-05-13 12:52:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [186.74861, 192.13693, 164.00911, 187.5397, 337.4154, 134.40845, 123.98922, 239.30994, 159.04538, 186.40747]
2025-05-13 12:52:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 111.0, 101.0, 111.0, 160.0, 86.0, 84.0, 123.0, 103.0, 107.0]
2025-05-13 12:52:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 6 minutes, 58 seconds)
2025-05-13 12:55:56,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:55:58,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 156.67212 ± 41.041
2025-05-13 12:55:58,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [135.14584, 195.29985, 171.53629, 243.64537, 125.13827, 158.38264, 113.6649, 158.79466, 171.85083, 93.26261]
2025-05-13 12:55:58,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 129.0, 103.0, 126.0, 76.0, 95.0, 77.0, 99.0, 97.0, 72.0]
2025-05-13 12:55:58,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 3 minutes, 22 seconds)
2025-05-13 12:59:28,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:59:30,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 159.85399 ± 99.613
2025-05-13 12:59:30,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [117.68142, 186.09027, 379.90488, 27.33623, 150.27167, 155.75175, 148.53214, 137.67694, 25.683514, 269.61105]
2025-05-13 12:59:30,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 109.0, 162.0, 32.0, 87.0, 102.0, 92.0, 94.0, 28.0, 135.0]
2025-05-13 12:59:30,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 2 seconds)
2025-05-13 13:03:00,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:03:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 143.62204 ± 84.030
2025-05-13 13:03:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [29.801027, 158.49593, 182.40675, 138.11507, 28.176964, 233.92587, 110.81339, 122.477234, 110.13665, 321.87152]
2025-05-13 13:03:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 99.0, 115.0, 91.0, 30.0, 125.0, 72.0, 89.0, 77.0, 151.0]
2025-05-13 13:03:02,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 56 minutes, 33 seconds)
2025-05-13 13:06:30,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:06:32,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 171.89413 ± 51.646
2025-05-13 13:06:32,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [263.81888, 79.66355, 253.31743, 174.74341, 166.48157, 149.7997, 177.62431, 149.00456, 127.42692, 177.0612]
2025-05-13 13:06:32,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 67.0, 125.0, 105.0, 98.0, 91.0, 103.0, 90.0, 84.0, 100.0]
2025-05-13 13:06:32,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 52 minutes, 51 seconds)
2025-05-13 13:10:01,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:10:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 202.48285 ± 82.270
2025-05-13 13:10:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [188.47543, 165.86208, 268.21396, 342.1994, 208.04765, 196.96428, 166.05646, 27.321238, 165.17186, 296.5161]
2025-05-13 13:10:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 103.0, 132.0, 162.0, 114.0, 108.0, 103.0, 30.0, 111.0, 141.0]
2025-05-13 13:10:03,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 49 minutes, 10 seconds)
2025-05-13 13:13:32,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:13:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 171.00639 ± 61.481
2025-05-13 13:13:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [118.082664, 137.59416, 167.9485, 131.585, 164.56227, 251.52806, 129.9133, 166.82195, 123.32702, 318.70105]
2025-05-13 13:13:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 89.0, 104.0, 86.0, 103.0, 122.0, 84.0, 101.0, 84.0, 148.0]
2025-05-13 13:13:34,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 45 minutes, 39 seconds)
2025-05-13 13:17:06,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:17:08,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 178.94472 ± 84.421
2025-05-13 13:17:08,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [124.02451, 30.593023, 277.30087, 143.24115, 316.53577, 201.51028, 278.88388, 112.53846, 147.46155, 157.35762]
2025-05-13 13:17:08,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 32.0, 132.0, 87.0, 154.0, 115.0, 132.0, 82.0, 96.0, 94.0]
2025-05-13 13:17:08,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 42 minutes, 16 seconds)
2025-05-13 13:20:36,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:20:38,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 151.61142 ± 77.384
2025-05-13 13:20:38,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [152.45038, 235.42421, 106.08427, 28.534037, 149.2574, 203.2174, 149.89511, 279.35028, 26.962303, 184.93872]
2025-05-13 13:20:38,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 124.0, 75.0, 30.0, 92.0, 114.0, 95.0, 127.0, 28.0, 101.0]
2025-05-13 13:20:38,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 38 minutes, 34 seconds)
2025-05-13 13:24:07,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:24:09,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 188.17490 ± 74.678
2025-05-13 13:24:09,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [94.87347, 233.33179, 307.16605, 169.44896, 197.69957, 133.47125, 117.47705, 111.20607, 316.0156, 201.05923]
2025-05-13 13:24:09,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 113.0, 136.0, 93.0, 102.0, 80.0, 80.0, 76.0, 137.0, 108.0]
2025-05-13 13:24:09,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 35 minutes, 6 seconds)
2025-05-13 13:27:40,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:27:42,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 160.96843 ± 58.911
2025-05-13 13:27:42,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [166.87846, 173.46457, 120.4229, 24.32107, 131.97293, 209.77164, 179.29312, 260.06305, 188.88358, 154.61308]
2025-05-13 13:27:42,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 106.0, 88.0, 27.0, 84.0, 113.0, 95.0, 123.0, 105.0, 93.0]
2025-05-13 13:27:42,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 31 minutes, 44 seconds)
2025-05-13 13:31:10,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:31:12,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 196.25723 ± 101.078
2025-05-13 13:31:12,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [165.47108, 458.0839, 86.89883, 237.60397, 261.31644, 122.22958, 120.7471, 160.1248, 148.79547, 201.3011]
2025-05-13 13:31:12,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 175.0, 62.0, 121.0, 134.0, 79.0, 83.0, 92.0, 96.0, 112.0]
2025-05-13 13:31:12,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 28 minutes, 8 seconds)
2025-05-13 13:34:43,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:34:45,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 123.20808 ± 38.787
2025-05-13 13:34:45,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [124.21165, 172.39464, 104.92837, 157.03012, 104.161194, 110.45825, 134.52702, 161.34464, 28.7828, 134.24203]
2025-05-13 13:34:45,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 104.0, 69.0, 104.0, 78.0, 80.0, 94.0, 98.0, 31.0, 85.0]
2025-05-13 13:34:45,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 24 minutes, 32 seconds)
2025-05-13 13:38:14,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:38:16,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 183.59048 ± 39.902
2025-05-13 13:38:16,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [126.995026, 229.4262, 161.51288, 262.95273, 193.23251, 181.09578, 203.53192, 190.81316, 140.37837, 145.966]
2025-05-13 13:38:16,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 118.0, 100.0, 125.0, 104.0, 113.0, 131.0, 102.0, 90.0, 95.0]
2025-05-13 13:38:16,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 21 minutes, 6 seconds)
2025-05-13 13:41:47,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:41:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 217.38550 ± 66.712
2025-05-13 13:41:50,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [205.9179, 197.05789, 285.91574, 284.07788, 324.84595, 157.28549, 135.58719, 115.518166, 201.82642, 265.82227]
2025-05-13 13:41:50,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 120.0, 132.0, 127.0, 174.0, 101.0, 84.0, 82.0, 106.0, 134.0]
2025-05-13 13:41:50,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 17 minutes, 46 seconds)
2025-05-13 13:45:19,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:45:21,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 176.39552 ± 51.587
2025-05-13 13:45:21,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [237.20026, 105.44604, 137.73167, 245.38887, 172.76408, 116.40697, 215.67342, 229.37685, 116.82245, 187.14464]
2025-05-13 13:45:21,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 78.0, 100.0, 120.0, 108.0, 83.0, 114.0, 116.0, 72.0, 118.0]
2025-05-13 13:45:21,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 14 minutes, 9 seconds)
2025-05-13 13:48:50,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:48:52,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 154.79733 ± 41.682
2025-05-13 13:48:52,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [142.75206, 147.90926, 126.91938, 193.89188, 86.636314, 165.67491, 104.31243, 155.52042, 190.0791, 234.27774]
2025-05-13 13:48:52,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 95.0, 86.0, 109.0, 60.0, 100.0, 71.0, 92.0, 116.0, 130.0]
2025-05-13 13:48:52,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 10 minutes, 40 seconds)
2025-05-13 13:52:21,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:52:24,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 190.58267 ± 44.822
2025-05-13 13:52:24,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [174.07848, 291.582, 172.54634, 166.92781, 174.64136, 146.42657, 144.38481, 184.23387, 256.4468, 194.55858]
2025-05-13 13:52:24,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 145.0, 98.0, 106.0, 102.0, 96.0, 89.0, 113.0, 132.0, 113.0]
2025-05-13 13:52:24,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 7 minutes, 3 seconds)
2025-05-13 13:55:52,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:55:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 175.35858 ± 42.176
2025-05-13 13:55:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [141.78989, 193.82426, 159.24709, 257.1221, 170.65515, 176.93884, 228.29501, 117.98782, 188.01437, 119.711334]
2025-05-13 13:55:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 106.0, 98.0, 125.0, 99.0, 104.0, 120.0, 77.0, 112.0, 83.0]
2025-05-13 13:55:55,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 3 minutes, 31 seconds)
2025-05-13 13:59:23,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:59:25,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 162.01834 ± 75.203
2025-05-13 13:59:25,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [28.305332, 296.97186, 279.5223, 164.29158, 154.85718, 99.11564, 178.26398, 130.44473, 124.87125, 163.53963]
2025-05-13 13:59:25,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 141.0, 128.0, 97.0, 90.0, 67.0, 104.0, 98.0, 82.0, 94.0]
2025-05-13 13:59:25,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 59 minutes, 48 seconds)
2025-05-13 14:02:56,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:02:58,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 144.48433 ± 83.913
2025-05-13 14:02:58,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [140.45778, 114.08959, 107.47438, 175.9451, 25.523388, 326.6139, 228.77896, 152.82013, 140.92053, 32.21963]
2025-05-13 14:02:58,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 84.0, 77.0, 99.0, 30.0, 152.0, 119.0, 92.0, 87.0, 32.0]
2025-05-13 14:02:58,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 56 minutes, 21 seconds)
2025-05-13 14:06:28,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:06:30,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 186.68695 ± 51.391
2025-05-13 14:06:30,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [251.64493, 146.74197, 202.5105, 249.68883, 126.162094, 105.73948, 147.40886, 241.6565, 223.68274, 171.63351]
2025-05-13 14:06:30,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 88.0, 110.0, 130.0, 89.0, 76.0, 102.0, 126.0, 124.0, 104.0]
2025-05-13 14:06:30,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 52 minutes, 53 seconds)
2025-05-13 14:09:57,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:10:00,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 182.91714 ± 25.743
2025-05-13 14:10:00,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [202.43677, 195.39062, 200.59361, 189.67648, 151.84715, 155.78825, 184.39745, 222.88402, 134.40732, 191.74971]
2025-05-13 14:10:00,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 107.0, 114.0, 104.0, 88.0, 97.0, 106.0, 118.0, 95.0, 109.0]
2025-05-13 14:10:00,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 49 minutes, 16 seconds)
2025-05-13 14:13:29,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:13:30,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 131.58359 ± 56.788
2025-05-13 14:13:30,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [130.22978, 177.10463, 191.17828, 32.117508, 185.727, 127.19311, 127.96246, 165.587, 156.61717, 22.11892]
2025-05-13 14:13:30,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 105.0, 107.0, 32.0, 108.0, 91.0, 84.0, 99.0, 117.0, 26.0]
2025-05-13 14:13:30,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 45 minutes, 45 seconds)
2025-05-13 14:16:59,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:17:02,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 207.55905 ± 130.428
2025-05-13 14:17:02,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [29.17921, 127.56625, 247.79082, 188.72687, 129.83685, 166.56596, 147.99353, 368.05386, 508.78333, 161.09396]
2025-05-13 14:17:02,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 91.0, 120.0, 117.0, 82.0, 96.0, 95.0, 159.0, 180.0, 88.0]
2025-05-13 14:17:02,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 42 minutes, 15 seconds)
2025-05-13 14:20:31,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:20:33,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 159.74814 ± 95.686
2025-05-13 14:20:33,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [345.2908, 263.3599, 134.74313, 109.39783, 192.9721, 205.77097, 25.123184, 186.76956, 22.497643, 111.55627]
2025-05-13 14:20:33,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [154.0, 130.0, 97.0, 70.0, 113.0, 113.0, 29.0, 113.0, 27.0, 77.0]
2025-05-13 14:20:33,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 38 minutes, 42 seconds)
2025-05-13 14:24:02,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:24:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 174.90300 ± 67.466
2025-05-13 14:24:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [174.30632, 172.39517, 160.45732, 129.53635, 262.93008, 28.245497, 188.8684, 146.64407, 199.43262, 286.2143]
2025-05-13 14:24:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 105.0, 92.0, 83.0, 134.0, 32.0, 111.0, 99.0, 115.0, 140.0]
2025-05-13 14:24:04,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 35 minutes, 7 seconds)
2025-05-13 14:27:35,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:27:36,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 143.88452 ± 71.623
2025-05-13 14:27:36,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [182.41316, 199.02504, 119.57352, 103.11093, 243.79109, 26.907372, 156.87968, 25.15357, 223.78355, 158.2073]
2025-05-13 14:27:36,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 111.0, 79.0, 75.0, 127.0, 30.0, 90.0, 28.0, 118.0, 101.0]
2025-05-13 14:27:36,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 31 minutes, 42 seconds)
2025-05-13 14:31:07,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:31:09,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 179.12288 ± 93.709
2025-05-13 14:31:09,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [31.839241, 128.31027, 254.94109, 402.75653, 197.56967, 123.17395, 210.6267, 136.42534, 145.52718, 160.05878]
2025-05-13 14:31:09,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 78.0, 136.0, 170.0, 117.0, 89.0, 117.0, 93.0, 93.0, 95.0]
2025-05-13 14:31:09,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 28 minutes, 14 seconds)
2025-05-13 14:34:37,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:34:40,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 182.09268 ± 87.797
2025-05-13 14:34:40,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [221.62723, 260.33554, 121.3484, 138.14177, 342.863, 24.94495, 203.92857, 117.939354, 259.3703, 130.42755]
2025-05-13 14:34:40,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 123.0, 78.0, 94.0, 149.0, 32.0, 105.0, 82.0, 121.0, 87.0]
2025-05-13 14:34:40,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 24 minutes, 41 seconds)
2025-05-13 14:38:08,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:38:11,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 196.10144 ± 103.121
2025-05-13 14:38:11,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [30.421486, 131.98471, 174.59077, 233.49506, 281.15768, 206.49472, 346.1269, 204.47997, 30.87508, 321.38806]
2025-05-13 14:38:11,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 89.0, 92.0, 129.0, 138.0, 112.0, 167.0, 114.0, 31.0, 150.0]
2025-05-13 14:38:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 21 minutes, 8 seconds)
2025-05-13 14:41:41,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:41:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 174.77426 ± 64.914
2025-05-13 14:41:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [172.9372, 121.56575, 192.72783, 201.21202, 145.88704, 115.73803, 129.51018, 163.8816, 351.70163, 152.5813]
2025-05-13 14:41:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 87.0, 111.0, 108.0, 97.0, 80.0, 82.0, 98.0, 179.0, 97.0]
2025-05-13 14:41:43,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 17 minutes, 39 seconds)
2025-05-13 14:45:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:45:13,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 173.72200 ± 72.765
2025-05-13 14:45:13,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [268.74692, 200.85605, 158.70856, 277.60986, 247.45445, 28.860092, 160.3481, 119.30097, 138.31982, 137.01517]
2025-05-13 14:45:13,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 111.0, 103.0, 135.0, 122.0, 29.0, 106.0, 77.0, 91.0, 90.0]
2025-05-13 14:45:13,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 5 seconds)
2025-05-13 14:48:43,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:48:45,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 150.41579 ± 71.674
2025-05-13 14:48:45,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [138.38777, 152.37814, 24.222982, 95.94334, 150.44415, 142.2426, 140.72856, 259.11206, 288.0088, 112.6893]
2025-05-13 14:48:45,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 88.0, 28.0, 68.0, 96.0, 87.0, 93.0, 135.0, 135.0, 84.0]
2025-05-13 14:48:45,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 33 seconds)
2025-05-13 14:52:15,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:52:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 185.53360 ± 66.411
2025-05-13 14:52:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [111.70915, 206.58783, 359.5502, 119.67931, 173.04877, 149.19855, 159.29137, 207.75136, 205.82854, 162.691]
2025-05-13 14:52:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 113.0, 157.0, 87.0, 94.0, 90.0, 100.0, 114.0, 112.0, 98.0]
2025-05-13 14:52:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 2 seconds)
2025-05-13 14:55:46,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:55:48,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 191.44376 ± 88.265
2025-05-13 14:55:48,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [243.63344, 400.1272, 188.81723, 142.2495, 265.81137, 186.91505, 118.815285, 175.39017, 100.5104, 92.16784]
2025-05-13 14:55:48,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 162.0, 103.0, 91.0, 142.0, 119.0, 77.0, 97.0, 74.0, 67.0]
2025-05-13 14:55:48,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 31 seconds)
2025-05-13 14:59:18,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:59:21,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 212.83255 ± 81.678
2025-05-13 14:59:21,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [143.95476, 243.38498, 243.05925, 409.47864, 191.12968, 123.708206, 275.69757, 175.77985, 195.36493, 126.76766]
2025-05-13 14:59:21,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 127.0, 137.0, 171.0, 119.0, 85.0, 139.0, 105.0, 107.0, 83.0]
2025-05-13 14:59:21,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1251 [DEBUG]: Training session finished
