2025-05-13 09:06:21,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mda-mem2
2025-05-13 09:06:21,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mda-mem2
2025-05-13 09:06:21,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14c5eebeabd0>}
2025-05-13 09:06:21,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:21,827 baseline-bpql-mda-noisy-hopper:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-13 09:06:21,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-13 09:06:21,832 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-13 09:06:21,832 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:21,838 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2025-05-13 09:06:22,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:22,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-13 09:09:13,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:09:13,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 64.96519 ± 10.713
2025-05-13 09:09:13,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [33.516937, 67.83751, 67.92715, 65.745995, 67.81935, 68.166374, 67.8859, 65.88161, 73.35012, 71.52101]
2025-05-13 09:09:13,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 41.0, 41.0, 40.0, 41.0, 41.0, 41.0, 40.0, 44.0, 43.0]
2025-05-13 09:09:13,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (64.97) for latency ExtremeClogL1U23
2025-05-13 09:09:13,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 42 minutes, 35 seconds)
2025-05-13 09:12:14,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:12:16,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 143.27933 ± 107.286
2025-05-13 09:12:16,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [58.245975, 64.86371, 58.024002, 232.0356, 126.18532, 365.65073, 258.928, 189.01147, 41.369312, 38.479248]
2025-05-13 09:12:16,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 53.0, 41.0, 191.0, 133.0, 281.0, 237.0, 143.0, 43.0, 48.0]
2025-05-13 09:12:16,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (143.28) for latency ExtremeClogL1U23
2025-05-13 09:12:16,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 48 minutes, 59 seconds)
2025-05-13 09:15:15,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:15:18,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 217.75037 ± 176.761
2025-05-13 09:15:18,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [532.6277, 32.968124, 91.640816, 555.2958, 229.44118, 31.907774, 168.9066, 114.388596, 183.2213, 237.10574]
2025-05-13 09:15:18,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [519.0, 33.0, 78.0, 548.0, 217.0, 34.0, 155.0, 97.0, 158.0, 226.0]
2025-05-13 09:15:18,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (217.75) for latency ExtremeClogL1U23
2025-05-13 09:15:18,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 48 minutes, 38 seconds)
2025-05-13 09:18:15,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:18:16,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 154.35335 ± 109.787
2025-05-13 09:18:16,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [305.24603, 125.59185, 187.99408, 16.44229, 28.806381, 45.823467, 313.42334, 276.6233, 63.180298, 180.40236]
2025-05-13 09:18:16,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 65.0, 93.0, 17.0, 34.0, 33.0, 139.0, 114.0, 39.0, 89.0]
2025-05-13 09:18:16,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 45 minutes, 44 seconds)
2025-05-13 09:21:16,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:21:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 196.36716 ± 148.003
2025-05-13 09:21:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [367.9461, 176.82544, 414.65323, 431.31656, 91.26959, 11.138744, 64.18964, 121.33567, 64.65978, 220.33675]
2025-05-13 09:21:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 176.0, 204.0, 272.0, 71.0, 14.0, 48.0, 116.0, 45.0, 177.0]
2025-05-13 09:21:18,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 43 minutes, 34 seconds)
2025-05-13 09:24:18,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:24:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 260.70804 ± 124.221
2025-05-13 09:24:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [342.74344, 342.68573, 323.88327, 128.96375, 331.34332, 347.3954, 338.47635, 83.35623, 14.157031, 354.07565]
2025-05-13 09:24:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 126.0, 121.0, 65.0, 169.0, 130.0, 126.0, 47.0, 16.0, 130.0]
2025-05-13 09:24:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (260.71) for latency ExtremeClogL1U23
2025-05-13 09:24:19,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 43 minutes, 45 seconds)
2025-05-13 09:27:18,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:27:19,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 247.37137 ± 146.959
2025-05-13 09:27:19,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [408.88358, 205.80759, 391.4406, 29.324835, 36.027153, 398.18396, 91.331245, 285.29123, 219.2268, 408.19644]
2025-05-13 09:27:19,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 89.0, 138.0, 28.0, 42.0, 146.0, 54.0, 116.0, 104.0, 158.0]
2025-05-13 09:27:19,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 39 minutes, 59 seconds)
2025-05-13 09:30:16,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:30:18,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 325.72382 ± 118.014
2025-05-13 09:30:18,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [134.11082, 370.06082, 452.7375, 434.05548, 329.24393, 373.15976, 401.24612, 367.5877, 74.77422, 320.262]
2025-05-13 09:30:18,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 173.0, 152.0, 152.0, 134.0, 181.0, 148.0, 149.0, 43.0, 159.0]
2025-05-13 09:30:18,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (325.72) for latency ExtremeClogL1U23
2025-05-13 09:30:18,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 36 minutes, 4 seconds)
2025-05-13 09:33:16,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:33:18,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 312.08316 ± 69.372
2025-05-13 09:33:18,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [324.2144, 362.91602, 354.93652, 331.61676, 330.9887, 348.61792, 214.50757, 351.2966, 357.94464, 143.79243]
2025-05-13 09:33:18,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 150.0, 148.0, 155.0, 140.0, 149.0, 107.0, 145.0, 151.0, 75.0]
2025-05-13 09:33:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 33 minutes, 22 seconds)
2025-05-13 09:36:16,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:36:17,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 239.60698 ± 155.237
2025-05-13 09:36:17,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [47.02491, 41.789272, 255.47977, 307.9692, 46.978832, 376.67697, 396.45084, 414.01776, 410.50668, 99.175545]
2025-05-13 09:36:17,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 39.0, 144.0, 118.0, 55.0, 139.0, 152.0, 165.0, 156.0, 55.0]
2025-05-13 09:36:17,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 29 minutes, 50 seconds)
2025-05-13 09:39:13,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:39:15,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 343.75320 ± 94.989
2025-05-13 09:39:15,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [391.38025, 348.1006, 82.86694, 266.84705, 388.9357, 393.43216, 390.92096, 379.9524, 405.68747, 389.40845]
2025-05-13 09:39:15,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 126.0, 64.0, 118.0, 143.0, 147.0, 144.0, 139.0, 147.0, 143.0]
2025-05-13 09:39:15,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (343.75) for latency ExtremeClogL1U23
2025-05-13 09:39:15,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 25 minutes, 46 seconds)
2025-05-13 09:42:12,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:42:14,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 371.15298 ± 83.186
2025-05-13 09:42:14,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [201.60168, 419.72696, 424.94003, 399.80972, 443.53784, 404.6121, 213.09448, 415.04395, 392.06747, 397.09583]
2025-05-13 09:42:14,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 186.0, 165.0, 142.0, 177.0, 139.0, 92.0, 149.0, 139.0, 140.0]
2025-05-13 09:42:14,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (371.15) for latency ExtremeClogL1U23
2025-05-13 09:42:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 22 minutes, 29 seconds)
2025-05-13 09:45:11,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:45:13,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 325.40900 ± 90.397
2025-05-13 09:45:13,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [366.25644, 365.4166, 246.53465, 360.62762, 423.72192, 366.65726, 88.156235, 312.86032, 346.05298, 377.80618]
2025-05-13 09:45:13,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 129.0, 99.0, 127.0, 218.0, 129.0, 51.0, 131.0, 164.0, 133.0]
2025-05-13 09:45:13,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 19 minutes, 27 seconds)
2025-05-13 09:48:10,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:48:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 296.68317 ± 140.860
2025-05-13 09:48:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [381.41434, 399.34253, 70.46507, 411.03424, 440.43573, 109.506996, 431.67297, 167.4683, 165.69511, 389.79642]
2025-05-13 09:48:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 168.0, 48.0, 215.0, 155.0, 57.0, 190.0, 102.0, 107.0, 155.0]
2025-05-13 09:48:12,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 16 minutes, 16 seconds)
2025-05-13 09:51:08,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:51:10,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 315.83185 ± 144.539
2025-05-13 09:51:10,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [368.74734, 376.71432, 155.84808, 22.679865, 493.5427, 434.67346, 380.07672, 143.67505, 396.18427, 386.17676]
2025-05-13 09:51:10,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 132.0, 99.0, 28.0, 185.0, 155.0, 134.0, 69.0, 156.0, 135.0]
2025-05-13 09:51:10,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 12 minutes, 52 seconds)
2025-05-13 09:54:05,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:54:07,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 318.27036 ± 109.901
2025-05-13 09:54:07,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [257.182, 337.6421, 398.72678, 386.06894, 392.94003, 130.67624, 101.99808, 421.75705, 389.11185, 366.6007]
2025-05-13 09:54:07,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 149.0, 143.0, 150.0, 135.0, 78.0, 56.0, 169.0, 134.0, 128.0]
2025-05-13 09:54:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 9 minutes, 48 seconds)
2025-05-13 09:57:04,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:57:05,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 316.46008 ± 80.892
2025-05-13 09:57:05,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [261.96222, 318.99554, 362.26355, 194.48376, 393.63547, 261.5142, 381.34482, 458.91275, 207.17418, 324.3144]
2025-05-13 09:57:05,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 186.0, 142.0, 88.0, 138.0, 104.0, 135.0, 185.0, 92.0, 118.0]
2025-05-13 09:57:05,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 6 minutes, 37 seconds)
2025-05-13 10:00:01,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:00:02,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 308.05429 ± 173.451
2025-05-13 10:00:02,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [10.920444, 130.25594, 424.26965, 452.84232, 18.063477, 388.8172, 466.33835, 442.8503, 327.36487, 418.82037]
2025-05-13 10:00:02,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 73.0, 182.0, 151.0, 17.0, 138.0, 153.0, 148.0, 144.0, 144.0]
2025-05-13 10:00:02,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 3 minutes, 15 seconds)
2025-05-13 10:02:58,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:03:00,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 366.75665 ± 131.123
2025-05-13 10:03:00,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [193.18541, 437.20477, 444.22296, 314.5482, 58.27501, 431.26416, 450.57208, 435.30295, 488.294, 414.69696]
2025-05-13 10:03:00,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 159.0, 150.0, 162.0, 47.0, 149.0, 150.0, 148.0, 183.0, 146.0]
2025-05-13 10:03:00,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 59 minutes, 50 seconds)
2025-05-13 10:05:57,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:05:59,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 413.37686 ± 133.946
2025-05-13 10:05:59,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [425.3899, 476.4078, 468.03763, 16.442884, 432.75507, 483.0984, 423.6279, 472.4575, 468.94876, 466.60254]
2025-05-13 10:05:59,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 157.0, 153.0, 16.0, 152.0, 159.0, 163.0, 156.0, 156.0, 157.0]
2025-05-13 10:05:59,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (413.38) for latency ExtremeClogL1U23
2025-05-13 10:05:59,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 57 minutes, 9 seconds)
2025-05-13 10:08:54,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:08:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 337.61844 ± 168.454
2025-05-13 10:08:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [450.10846, 129.29405, 463.69897, 467.33118, 12.473819, 436.81638, 454.74106, 350.7383, 127.33549, 483.64667]
2025-05-13 10:08:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 67.0, 165.0, 162.0, 14.0, 158.0, 170.0, 151.0, 64.0, 181.0]
2025-05-13 10:08:56,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 54 minutes, 5 seconds)
2025-05-13 10:11:51,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:11:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 338.82651 ± 110.325
2025-05-13 10:11:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [423.77045, 391.05008, 444.67148, 384.33737, 238.32846, 412.0942, 118.825775, 428.2792, 178.36905, 368.53894]
2025-05-13 10:11:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 161.0, 148.0, 161.0, 117.0, 168.0, 72.0, 186.0, 100.0, 169.0]
2025-05-13 10:11:52,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 50 minutes, 37 seconds)
2025-05-13 10:14:50,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:14:52,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 281.53622 ± 200.356
2025-05-13 10:14:52,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [461.0314, 466.9655, 40.352783, 495.80475, 358.3069, 47.230976, 412.55112, 41.808266, 460.99384, 30.316736]
2025-05-13 10:14:52,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 152.0, 32.0, 180.0, 185.0, 41.0, 157.0, 40.0, 190.0, 26.0]
2025-05-13 10:14:52,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 48 minutes, 12 seconds)
2025-05-13 10:17:47,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:17:48,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 363.08908 ± 163.407
2025-05-13 10:17:48,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [448.76303, 35.17513, 457.8572, 45.218098, 475.7143, 463.3602, 430.5738, 443.27158, 453.6627, 377.29495]
2025-05-13 10:17:48,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 28.0, 152.0, 34.0, 169.0, 155.0, 148.0, 150.0, 161.0, 137.0]
2025-05-13 10:17:48,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 45 minutes)
2025-05-13 10:20:44,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:20:46,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 443.04419 ± 59.391
2025-05-13 10:20:46,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [481.64145, 448.86438, 414.78693, 431.20215, 473.24728, 470.86737, 445.77844, 280.8084, 485.4548, 497.79068]
2025-05-13 10:20:46,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 150.0, 141.0, 147.0, 158.0, 154.0, 148.0, 124.0, 158.0, 185.0]
2025-05-13 10:20:46,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (443.04) for latency ExtremeClogL1U23
2025-05-13 10:20:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 41 minutes, 47 seconds)
2025-05-13 10:23:43,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:23:45,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 365.65790 ± 121.929
2025-05-13 10:23:45,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [476.1552, 419.04578, 474.52484, 421.6468, 110.44544, 474.12747, 271.4036, 457.5569, 340.0821, 211.59094]
2025-05-13 10:23:45,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 143.0, 155.0, 143.0, 59.0, 160.0, 105.0, 153.0, 133.0, 97.0]
2025-05-13 10:23:45,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 39 minutes, 12 seconds)
2025-05-13 10:26:39,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:26:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 361.80817 ± 169.193
2025-05-13 10:26:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [474.53638, 467.5713, 110.6639, 500.42972, 407.7774, 115.449425, 513.5859, 93.72753, 458.01422, 476.32596]
2025-05-13 10:26:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 153.0, 62.0, 177.0, 147.0, 95.0, 165.0, 53.0, 162.0, 165.0]
2025-05-13 10:26:41,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 36 minutes, 11 seconds)
2025-05-13 10:29:38,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:29:40,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 312.01437 ± 141.661
2025-05-13 10:29:40,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [208.47946, 514.1844, 5.619414, 201.8815, 394.2577, 333.09848, 413.46497, 410.26822, 408.36966, 230.52013]
2025-05-13 10:29:40,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 177.0, 9.0, 89.0, 140.0, 124.0, 160.0, 140.0, 140.0, 98.0]
2025-05-13 10:29:40,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 33 minutes, 8 seconds)
2025-05-13 10:32:36,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:32:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 377.15082 ± 177.356
2025-05-13 10:32:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [350.64587, 485.577, 465.2384, 60.593945, 483.3783, 451.32602, 493.01465, 447.0599, 7.616372, 527.0577]
2025-05-13 10:32:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 164.0, 165.0, 42.0, 182.0, 162.0, 160.0, 158.0, 9.0, 190.0]
2025-05-13 10:32:37,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 30 minutes, 27 seconds)
2025-05-13 10:35:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:35:35,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 377.17691 ± 134.484
2025-05-13 10:35:35,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [517.45496, 469.70932, 202.4356, 118.99698, 498.86484, 331.29587, 526.5223, 318.86688, 472.096, 315.52646]
2025-05-13 10:35:35,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 153.0, 89.0, 80.0, 162.0, 129.0, 189.0, 196.0, 171.0, 132.0]
2025-05-13 10:35:35,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 27 minutes, 20 seconds)
2025-05-13 10:38:30,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:38:31,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 381.55347 ± 193.158
2025-05-13 10:38:31,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [105.21052, 556.86176, 75.08674, 98.691696, 481.22928, 540.23553, 462.41202, 443.35052, 477.56503, 574.8917]
2025-05-13 10:38:31,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 189.0, 47.0, 71.0, 158.0, 174.0, 151.0, 147.0, 155.0, 207.0]
2025-05-13 10:38:31,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 23 minutes, 58 seconds)
2025-05-13 10:41:28,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:41:30,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 362.70172 ± 246.977
2025-05-13 10:41:30,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [549.30884, 129.77827, 68.53177, 63.060135, 568.8259, 512.0422, 676.3876, 520.3748, 533.0787, 5.628992]
2025-05-13 10:41:30,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 68.0, 42.0, 37.0, 249.0, 181.0, 272.0, 167.0, 174.0, 9.0]
2025-05-13 10:41:30,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 21 minutes, 29 seconds)
2025-05-13 10:44:26,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:44:27,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 437.25909 ± 175.199
2025-05-13 10:44:27,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [546.426, 525.93823, 48.624767, 497.62323, 537.6871, 601.5797, 154.87268, 454.47064, 560.98267, 444.3856]
2025-05-13 10:44:27,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 168.0, 32.0, 173.0, 174.0, 188.0, 75.0, 150.0, 184.0, 149.0]
2025-05-13 10:44:27,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 18 minutes, 13 seconds)
2025-05-13 10:47:30,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:47:31,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 300.66791 ± 235.947
2025-05-13 10:47:31,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [14.944091, 752.5763, 394.39996, 32.21482, 313.9092, 319.3198, 326.10028, 613.9267, 7.4914184, 231.79672]
2025-05-13 10:47:31,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 317.0, 146.0, 36.0, 142.0, 175.0, 148.0, 230.0, 9.0, 112.0]
2025-05-13 10:47:31,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 16 minutes, 39 seconds)
2025-05-13 10:50:39,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:50:41,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 415.37067 ± 165.068
2025-05-13 10:50:41,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [572.49365, 573.8601, 533.06464, 313.69342, 435.26816, 309.47107, 388.17517, 528.4568, 6.3147006, 492.90903]
2025-05-13 10:50:41,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 182.0, 170.0, 120.0, 195.0, 126.0, 138.0, 168.0, 9.0, 211.0]
2025-05-13 10:50:41,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 16 minutes, 19 seconds)
2025-05-13 10:53:51,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:53:53,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 255.97159 ± 230.279
2025-05-13 10:53:53,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [592.5088, 602.98, 132.88474, 100.439514, 24.38968, 24.74383, 183.85852, 553.29675, 318.20953, 26.404465]
2025-05-13 10:53:53,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 191.0, 69.0, 57.0, 27.0, 28.0, 88.0, 210.0, 127.0, 28.0]
2025-05-13 10:53:53,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 16 minutes, 31 seconds)
2025-05-13 10:57:03,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:57:05,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 435.85040 ± 171.567
2025-05-13 10:57:05,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [580.37604, 152.12303, 234.98395, 562.19714, 547.6232, 560.1532, 597.3168, 197.0484, 573.1118, 353.57013]
2025-05-13 10:57:05,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 73.0, 106.0, 195.0, 174.0, 179.0, 195.0, 91.0, 178.0, 130.0]
2025-05-13 10:57:05,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 16 minutes, 18 seconds)
2025-05-13 11:00:01,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:00:03,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 432.72681 ± 197.533
2025-05-13 11:00:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [659.1004, 355.6908, 529.39496, 31.762835, 450.25742, 575.55084, 125.33664, 566.21075, 605.5459, 428.41733]
2025-05-13 11:00:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [206.0, 147.0, 172.0, 26.0, 189.0, 190.0, 64.0, 194.0, 208.0, 151.0]
2025-05-13 11:00:03,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 13 minutes, 22 seconds)
2025-05-13 11:03:00,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:03:02,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 387.88983 ± 163.478
2025-05-13 11:03:02,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [131.28525, 309.29266, 570.2805, 515.29034, 516.5936, 442.09222, 442.3994, 548.3666, 320.24707, 83.05079]
2025-05-13 11:03:02,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 117.0, 180.0, 166.0, 166.0, 157.0, 151.0, 174.0, 119.0, 61.0]
2025-05-13 11:03:02,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 9 minutes, 13 seconds)
2025-05-13 11:05:56,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:05:58,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 465.02206 ± 189.808
2025-05-13 11:05:58,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [578.18713, 588.176, 592.6238, 575.9181, 590.8969, 601.54333, 71.303444, 329.89603, 553.5199, 168.15634]
2025-05-13 11:05:58,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 206.0, 188.0, 214.0, 212.0, 220.0, 44.0, 127.0, 187.0, 81.0]
2025-05-13 11:05:58,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (465.02) for latency ExtremeClogL1U23
2025-05-13 11:05:58,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 3 minutes, 26 seconds)
2025-05-13 11:08:55,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:08:57,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 392.53564 ± 159.170
2025-05-13 11:08:57,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [458.39508, 577.5264, 337.2609, 543.58386, 173.07265, 87.14248, 393.0382, 521.275, 295.13263, 538.9294]
2025-05-13 11:08:57,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [154.0, 195.0, 126.0, 195.0, 82.0, 55.0, 141.0, 176.0, 115.0, 189.0]
2025-05-13 11:08:57,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 57 minutes, 52 seconds)
2025-05-13 11:12:02,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:12:03,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 321.01801 ± 206.597
2025-05-13 11:12:03,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [566.73755, 112.525185, 60.99254, 548.24255, 478.53992, 73.69749, 625.0565, 169.96457, 277.0044, 297.41943]
2025-05-13 11:12:03,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 60.0, 66.0, 175.0, 159.0, 43.0, 234.0, 78.0, 127.0, 175.0]
2025-05-13 11:12:03,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 53 minutes, 43 seconds)
2025-05-13 11:15:06,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:15:08,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 425.77139 ± 174.458
2025-05-13 11:15:08,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [311.56247, 522.4503, 207.94946, 382.44086, 552.1755, 55.10295, 521.2566, 539.6183, 519.28265, 645.875]
2025-05-13 11:15:08,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 167.0, 92.0, 138.0, 175.0, 56.0, 165.0, 171.0, 170.0, 232.0]
2025-05-13 11:15:08,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 51 minutes, 58 seconds)
2025-05-13 11:18:09,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:18:10,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 462.41608 ± 115.751
2025-05-13 11:18:10,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [500.62646, 558.5451, 459.70514, 443.60614, 441.8149, 479.7167, 475.2002, 171.54218, 654.4447, 438.95877]
2025-05-13 11:18:10,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 189.0, 151.0, 150.0, 150.0, 158.0, 157.0, 81.0, 232.0, 150.0]
2025-05-13 11:18:11,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 49 minutes, 36 seconds)
2025-05-13 11:21:05,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:21:08,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 504.47687 ± 110.447
2025-05-13 11:21:08,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [604.89417, 522.21655, 533.0698, 384.5487, 684.4378, 327.61676, 611.7806, 486.328, 527.9657, 361.91037]
2025-05-13 11:21:08,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 167.0, 170.0, 139.0, 316.0, 125.0, 261.0, 158.0, 168.0, 132.0]
2025-05-13 11:21:08,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (504.48) for latency ExtremeClogL1U23
2025-05-13 11:21:08,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 46 minutes, 46 seconds)
2025-05-13 11:24:04,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:24:06,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 327.06567 ± 209.697
2025-05-13 11:24:06,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [573.2858, 8.577793, 295.1923, 579.60657, 591.8639, 181.29863, 103.94855, 213.70508, 529.6579, 193.52025]
2025-05-13 11:24:06,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 10.0, 127.0, 185.0, 190.0, 84.0, 65.0, 98.0, 169.0, 96.0]
2025-05-13 11:24:06,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 43 minutes, 33 seconds)
2025-05-13 11:27:02,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:27:04,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 466.16608 ± 231.100
2025-05-13 11:27:04,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [541.66364, 199.89407, 609.74054, 558.438, 10.552529, 845.45123, 480.68118, 246.32268, 570.7407, 598.1762]
2025-05-13 11:27:04,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 94.0, 197.0, 177.0, 14.0, 278.0, 163.0, 137.0, 184.0, 215.0]
2025-05-13 11:27:04,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 39 minutes, 8 seconds)
2025-05-13 11:30:02,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:30:03,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 326.47339 ± 243.864
2025-05-13 11:30:03,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [42.774883, 576.3904, 614.58185, 509.40732, 125.77151, 544.34, 6.0944424, 582.8984, 93.13225, 169.34265]
2025-05-13 11:30:03,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 202.0, 204.0, 162.0, 66.0, 170.0, 9.0, 195.0, 80.0, 79.0]
2025-05-13 11:30:03,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 35 minutes, 8 seconds)
2025-05-13 11:33:06,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:33:08,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 369.06647 ± 213.324
2025-05-13 11:33:08,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [567.0347, 597.5224, 582.9262, 511.2365, 599.3305, 74.97656, 82.86986, 148.52347, 211.96768, 314.27658]
2025-05-13 11:33:08,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 194.0, 205.0, 165.0, 187.0, 44.0, 71.0, 92.0, 107.0, 127.0]
2025-05-13 11:33:08,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 32 minutes, 37 seconds)
2025-05-13 11:36:02,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:36:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 347.53186 ± 205.090
2025-05-13 11:36:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [625.3479, 115.46391, 86.52317, 572.002, 284.4629, 58.33604, 344.45737, 586.4272, 502.70328, 299.59482]
2025-05-13 11:36:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 63.0, 63.0, 198.0, 116.0, 100.0, 138.0, 185.0, 166.0, 115.0]
2025-05-13 11:36:03,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 29 minutes, 16 seconds)
2025-05-13 11:39:01,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:39:03,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 514.13049 ± 172.121
2025-05-13 11:39:03,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [550.5068, 622.36304, 541.367, 619.59845, 574.2531, 608.0948, 521.422, 8.391455, 568.1206, 527.1876]
2025-05-13 11:39:03,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 204.0, 172.0, 198.0, 181.0, 193.0, 165.0, 11.0, 206.0, 167.0]
2025-05-13 11:39:03,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (514.13) for latency ExtremeClogL1U23
2025-05-13 11:39:03,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 26 minutes, 34 seconds)
2025-05-13 11:42:00,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:42:02,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 419.66846 ± 205.086
2025-05-13 11:42:02,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [295.65607, 518.2796, 89.764, 593.06824, 582.3607, 76.2616, 625.2474, 271.30295, 597.7484, 546.99554]
2025-05-13 11:42:02,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 165.0, 51.0, 202.0, 221.0, 47.0, 200.0, 116.0, 191.0, 175.0]
2025-05-13 11:42:02,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 23 minutes, 37 seconds)
2025-05-13 11:44:59,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:45:01,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 436.57721 ± 209.568
2025-05-13 11:45:01,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [587.28064, 324.63107, 593.3462, 348.53296, 173.03348, 732.64996, 625.7358, 536.4045, 417.95383, 26.203638]
2025-05-13 11:45:01,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 126.0, 271.0, 143.0, 85.0, 289.0, 244.0, 169.0, 145.0, 31.0]
2025-05-13 11:45:01,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 20 minutes, 39 seconds)
2025-05-13 11:47:54,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:47:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 433.37613 ± 295.720
2025-05-13 11:47:56,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [63.64686, 39.960327, 613.1636, 493.2853, 756.5044, 741.3113, 615.4451, 247.13368, 752.258, 11.05266]
2025-05-13 11:47:56,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 38.0, 227.0, 169.0, 238.0, 232.0, 194.0, 107.0, 257.0, 13.0]
2025-05-13 11:47:56,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 16 minutes, 9 seconds)
2025-05-13 11:50:57,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:50:58,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 377.10690 ± 226.419
2025-05-13 11:50:58,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [340.06757, 589.7662, 113.6767, 551.0042, 687.8378, 613.61487, 146.05827, 25.996706, 485.62085, 217.42613]
2025-05-13 11:50:58,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 185.0, 68.0, 181.0, 214.0, 195.0, 74.0, 22.0, 168.0, 95.0]
2025-05-13 11:50:58,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 14 minutes, 15 seconds)
2025-05-13 11:53:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:53:54,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 460.36850 ± 195.570
2025-05-13 11:53:54,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [147.15753, 662.8119, 566.06354, 582.67224, 596.92413, 504.76346, 564.7989, 38.84709, 532.593, 407.05322]
2025-05-13 11:53:54,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 281.0, 177.0, 181.0, 186.0, 165.0, 178.0, 30.0, 193.0, 147.0]
2025-05-13 11:53:54,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 10 minutes, 37 seconds)
2025-05-13 11:56:51,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:56:53,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 423.41058 ± 300.211
2025-05-13 11:56:53,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [259.4886, 596.3432, 957.45667, 579.8315, 31.57185, 155.5902, 412.7456, 390.20175, 32.80133, 818.07526]
2025-05-13 11:56:53,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 189.0, 315.0, 184.0, 25.0, 100.0, 172.0, 141.0, 41.0, 261.0]
2025-05-13 11:56:53,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 7 minutes, 43 seconds)
2025-05-13 11:59:48,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:59:50,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 494.79901 ± 184.744
2025-05-13 11:59:50,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [307.3395, 82.48834, 582.94824, 829.25476, 542.6984, 548.22266, 476.29846, 488.55118, 599.6103, 490.578]
2025-05-13 11:59:50,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 90.0, 182.0, 293.0, 172.0, 172.0, 161.0, 161.0, 193.0, 159.0]
2025-05-13 11:59:50,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 4 minutes, 27 seconds)
2025-05-13 12:02:48,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:02:50,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 424.50424 ± 178.719
2025-05-13 12:02:50,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [8.962234, 538.22394, 507.33347, 277.84103, 636.6387, 485.6551, 506.08075, 515.5671, 523.1477, 245.59299]
2025-05-13 12:02:50,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 170.0, 163.0, 112.0, 211.0, 160.0, 161.0, 164.0, 175.0, 128.0]
2025-05-13 12:02:50,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 2 minutes, 6 seconds)
2025-05-13 12:05:44,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:05:45,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 346.32196 ± 246.967
2025-05-13 12:05:45,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [492.34235, 577.1477, 122.62987, 456.37433, 512.2923, 669.6932, 12.162775, 32.55, 43.54264, 544.4845]
2025-05-13 12:05:45,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 194.0, 64.0, 160.0, 163.0, 210.0, 13.0, 39.0, 51.0, 172.0]
2025-05-13 12:05:45,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 58 minutes, 17 seconds)
2025-05-13 12:08:45,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:08:47,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 425.46201 ± 230.915
2025-05-13 12:08:47,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [729.76965, 46.78988, 524.9434, 303.47192, 294.8826, 614.8954, 557.08105, 641.3201, 502.79642, 38.66947]
2025-05-13 12:08:47,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [281.0, 45.0, 166.0, 116.0, 122.0, 200.0, 178.0, 203.0, 162.0, 43.0]
2025-05-13 12:08:47,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 56 minutes, 3 seconds)
2025-05-13 12:11:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:11:44,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 520.03357 ± 109.443
2025-05-13 12:11:44,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [555.93567, 513.59705, 600.81396, 601.58124, 581.5653, 511.22058, 567.0616, 488.9808, 569.10114, 210.47874]
2025-05-13 12:11:44,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 164.0, 196.0, 187.0, 181.0, 163.0, 183.0, 156.0, 177.0, 93.0]
2025-05-13 12:11:44,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (520.03) for latency ExtremeClogL1U23
2025-05-13 12:11:44,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 52 minutes, 50 seconds)
2025-05-13 12:14:40,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:14:42,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 447.08252 ± 176.168
2025-05-13 12:14:42,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [366.91113, 550.3766, 654.09033, 208.79524, 514.9799, 63.952023, 609.05273, 569.1035, 479.47775, 454.0861]
2025-05-13 12:14:42,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 174.0, 227.0, 91.0, 163.0, 60.0, 194.0, 178.0, 158.0, 149.0]
2025-05-13 12:14:42,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 49 minutes, 57 seconds)
2025-05-13 12:17:38,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:17:40,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 516.95764 ± 152.263
2025-05-13 12:17:40,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [582.93146, 290.9574, 616.35767, 589.81635, 623.84784, 652.6297, 611.8575, 579.93787, 181.6467, 439.59357]
2025-05-13 12:17:40,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 131.0, 225.0, 188.0, 201.0, 205.0, 197.0, 181.0, 83.0, 154.0]
2025-05-13 12:17:40,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 46 minutes, 49 seconds)
2025-05-13 12:20:39,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:20:41,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 487.67474 ± 65.823
2025-05-13 12:20:41,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [521.73376, 527.6652, 525.4412, 557.71625, 313.86865, 457.3435, 447.69046, 498.09058, 518.84283, 508.35516]
2025-05-13 12:20:41,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 167.0, 167.0, 175.0, 118.0, 188.0, 153.0, 159.0, 166.0, 162.0]
2025-05-13 12:20:41,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 44 minutes, 29 seconds)
2025-05-13 12:23:39,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:23:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 490.00488 ± 125.575
2025-05-13 12:23:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [529.7936, 408.10535, 563.66504, 224.68665, 618.9149, 567.6995, 541.4813, 306.88766, 536.7414, 602.07306]
2025-05-13 12:23:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 145.0, 175.0, 94.0, 210.0, 179.0, 171.0, 119.0, 170.0, 194.0]
2025-05-13 12:23:41,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 41 minutes, 19 seconds)
2025-05-13 12:26:34,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:26:36,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 352.80481 ± 193.499
2025-05-13 12:26:36,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [499.1967, 562.56445, 437.896, 620.3982, 9.951195, 278.0111, 418.66983, 186.3288, 89.00923, 426.02267]
2025-05-13 12:26:36,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 174.0, 147.0, 208.0, 15.0, 115.0, 146.0, 96.0, 54.0, 148.0]
2025-05-13 12:26:36,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 38 minutes, 9 seconds)
2025-05-13 12:29:36,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:29:38,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 400.48267 ± 218.403
2025-05-13 12:29:38,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [238.84703, 594.3299, 43.972385, 29.5342, 579.3971, 629.1029, 582.7626, 516.1214, 301.39655, 489.36246]
2025-05-13 12:29:38,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 191.0, 26.0, 28.0, 181.0, 228.0, 183.0, 164.0, 117.0, 179.0]
2025-05-13 12:29:38,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 35 minutes, 35 seconds)
2025-05-13 12:32:32,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:32:33,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 429.98175 ± 266.876
2025-05-13 12:32:33,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [685.0571, 543.6488, 535.7426, 8.491071, 603.4518, 567.3962, 114.49412, 400.13504, 36.098637, 805.3019]
2025-05-13 12:32:33,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [257.0, 171.0, 170.0, 10.0, 191.0, 178.0, 65.0, 196.0, 40.0, 248.0]
2025-05-13 12:32:33,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 32 minutes, 18 seconds)
2025-05-13 12:35:28,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:35:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 545.00604 ± 116.070
2025-05-13 12:35:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [509.42136, 476.59515, 856.4545, 535.6142, 503.40784, 534.6509, 516.6942, 536.8659, 597.3462, 383.0101]
2025-05-13 12:35:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 196.0, 288.0, 169.0, 162.0, 169.0, 165.0, 169.0, 188.0, 149.0]
2025-05-13 12:35:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (545.01) for latency ExtremeClogL1U23
2025-05-13 12:35:31,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 28 minutes, 57 seconds)
2025-05-13 12:38:27,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:38:29,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 410.07025 ± 214.479
2025-05-13 12:38:29,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [65.25641, 551.1056, 586.9033, 551.87897, 477.55325, 195.92642, 617.41974, 348.30206, 641.40826, 64.948494]
2025-05-13 12:38:29,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 172.0, 183.0, 173.0, 170.0, 89.0, 215.0, 162.0, 203.0, 41.0]
2025-05-13 12:38:29,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 25 minutes, 51 seconds)
2025-05-13 12:41:26,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:41:28,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 473.34100 ± 214.440
2025-05-13 12:41:28,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [258.4404, 658.4158, 626.8589, 592.41876, 188.71996, 31.569487, 652.23004, 573.33014, 540.1254, 611.3012]
2025-05-13 12:41:28,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 245.0, 202.0, 185.0, 85.0, 26.0, 227.0, 179.0, 182.0, 190.0]
2025-05-13 12:41:28,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 23 minutes, 15 seconds)
2025-05-13 12:44:23,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:44:25,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 509.07196 ± 158.786
2025-05-13 12:44:25,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [574.1968, 533.07245, 523.8935, 626.29254, 439.10233, 64.1572, 546.8548, 535.1062, 592.58923, 655.4545]
2025-05-13 12:44:25,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 168.0, 166.0, 196.0, 147.0, 55.0, 171.0, 191.0, 184.0, 276.0]
2025-05-13 12:44:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 19 minutes, 53 seconds)
2025-05-13 12:47:23,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:47:26,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 533.28760 ± 142.719
2025-05-13 12:47:26,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [622.9337, 609.6436, 571.25256, 318.67554, 688.4645, 601.9692, 203.27869, 565.3121, 575.94446, 575.40173]
2025-05-13 12:47:26,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 191.0, 178.0, 123.0, 260.0, 189.0, 113.0, 178.0, 180.0, 181.0]
2025-05-13 12:47:26,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 17 minutes, 19 seconds)
2025-05-13 12:50:21,889 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:50:24,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 487.04160 ± 198.097
2025-05-13 12:50:24,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [44.34804, 321.65842, 694.3305, 471.2623, 444.88873, 618.4935, 595.6096, 334.3082, 635.84796, 709.6687]
2025-05-13 12:50:24,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 143.0, 239.0, 176.0, 159.0, 196.0, 191.0, 143.0, 240.0, 220.0]
2025-05-13 12:50:24,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 14 minutes, 24 seconds)
2025-05-13 12:53:19,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:53:21,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 486.48315 ± 233.367
2025-05-13 12:53:21,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [605.33453, 246.14853, 678.3296, 473.2173, 119.838425, 838.86646, 605.8431, 112.56675, 610.49854, 574.18854]
2025-05-13 12:53:21,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 105.0, 252.0, 167.0, 85.0, 257.0, 220.0, 60.0, 212.0, 180.0]
2025-05-13 12:53:21,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 11 minutes, 24 seconds)
2025-05-13 12:56:17,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:56:19,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 432.14606 ± 183.255
2025-05-13 12:56:19,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [595.0407, 533.39905, 541.11096, 578.1893, 203.88438, 396.6115, 208.9957, 585.1874, 92.44627, 586.5951]
2025-05-13 12:56:19,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 166.0, 170.0, 181.0, 90.0, 145.0, 105.0, 182.0, 62.0, 210.0]
2025-05-13 12:56:19,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 8 minutes, 17 seconds)
2025-05-13 12:59:19,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:59:20,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 457.10190 ± 145.586
2025-05-13 12:59:20,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [529.7174, 517.5118, 295.11258, 122.18136, 589.0965, 510.63898, 516.85736, 354.98282, 622.1096, 512.81067]
2025-05-13 12:59:20,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 163.0, 119.0, 91.0, 213.0, 163.0, 164.0, 133.0, 192.0, 163.0]
2025-05-13 12:59:21,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 5 minutes, 38 seconds)
2025-05-13 13:02:16,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:02:18,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 424.42197 ± 163.609
2025-05-13 13:02:18,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [563.8921, 421.45557, 555.43585, 590.985, 592.4444, 565.0243, 201.52167, 155.94681, 330.09473, 267.41937]
2025-05-13 13:02:18,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 148.0, 173.0, 220.0, 188.0, 177.0, 90.0, 75.0, 127.0, 107.0]
2025-05-13 13:02:18,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 2 minutes, 27 seconds)
2025-05-13 13:05:13,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:05:15,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 455.41168 ± 171.329
2025-05-13 13:05:15,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [500.27463, 84.89155, 665.4093, 207.13327, 545.18475, 553.78705, 576.80255, 536.8071, 374.69598, 509.13028]
2025-05-13 13:05:15,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 49.0, 220.0, 92.0, 171.0, 173.0, 179.0, 169.0, 129.0, 167.0]
2025-05-13 13:05:15,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 59 minutes, 24 seconds)
2025-05-13 13:08:13,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:08:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 437.45044 ± 191.826
2025-05-13 13:08:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [610.73157, 640.836, 469.63284, 26.617937, 264.73505, 639.2422, 490.16245, 406.67435, 252.80406, 573.06824]
2025-05-13 13:08:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 210.0, 173.0, 28.0, 126.0, 228.0, 160.0, 150.0, 106.0, 184.0]
2025-05-13 13:08:15,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 56 minutes, 34 seconds)
2025-05-13 13:11:10,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:11:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 417.15729 ± 257.684
2025-05-13 13:11:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [44.855988, 642.093, 530.0617, 634.5176, 573.44135, 29.025728, 600.0983, 567.03937, 543.0964, 7.3431396]
2025-05-13 13:11:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 228.0, 167.0, 199.0, 180.0, 23.0, 204.0, 176.0, 170.0, 9.0]
2025-05-13 13:11:12,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 53 minutes, 34 seconds)
2025-05-13 13:14:08,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:14:10,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 432.12421 ± 213.789
2025-05-13 13:14:10,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [304.9605, 585.1138, 517.601, 552.41785, 16.647085, 307.72256, 149.80045, 596.9209, 716.0477, 574.0101]
2025-05-13 13:14:10,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 183.0, 165.0, 172.0, 16.0, 133.0, 85.0, 199.0, 244.0, 179.0]
2025-05-13 13:14:10,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 50 minutes, 24 seconds)
2025-05-13 13:17:07,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:17:09,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 467.94400 ± 165.059
2025-05-13 13:17:09,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [543.49365, 526.5381, 769.70526, 510.06818, 548.32513, 322.35843, 111.53883, 547.54346, 409.97015, 389.89902]
2025-05-13 13:17:09,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 166.0, 237.0, 163.0, 191.0, 125.0, 83.0, 171.0, 144.0, 153.0]
2025-05-13 13:17:09,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 47 minutes, 32 seconds)
2025-05-13 13:20:06,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:20:07,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 398.24985 ± 184.767
2025-05-13 13:20:07,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [617.82385, 223.00264, 258.32773, 584.30975, 136.18452, 586.922, 308.4286, 171.95836, 546.68024, 548.8612]
2025-05-13 13:20:07,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 97.0, 105.0, 199.0, 69.0, 197.0, 116.0, 87.0, 171.0, 172.0]
2025-05-13 13:20:07,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 44 minutes, 37 seconds)
2025-05-13 13:23:05,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:23:06,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 365.09576 ± 238.645
2025-05-13 13:23:06,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [522.70825, 560.6155, 615.7029, 315.73892, 316.44476, 25.926603, 655.0978, 549.6009, 53.933247, 35.18905]
2025-05-13 13:23:06,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 177.0, 192.0, 134.0, 133.0, 31.0, 223.0, 172.0, 33.0, 38.0]
2025-05-13 13:23:06,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 41 minutes, 36 seconds)
2025-05-13 13:26:01,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:26:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 454.47379 ± 163.779
2025-05-13 13:26:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [472.30618, 80.417946, 498.89478, 555.0871, 533.0504, 543.6877, 505.024, 196.40796, 556.0639, 603.798]
2025-05-13 13:26:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 50.0, 165.0, 173.0, 167.0, 170.0, 165.0, 89.0, 212.0, 234.0]
2025-05-13 13:26:03,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 38 minutes, 37 seconds)
2025-05-13 13:29:02,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:29:04,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 419.29620 ± 165.367
2025-05-13 13:29:04,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [558.05206, 400.94006, 150.5304, 437.01898, 333.81793, 113.17029, 631.4135, 566.1987, 486.6184, 515.2015]
2025-05-13 13:29:04,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 149.0, 87.0, 154.0, 140.0, 63.0, 200.0, 176.0, 157.0, 162.0]
2025-05-13 13:29:04,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 35 minutes, 45 seconds)
2025-05-13 13:31:58,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:32:00,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 433.40826 ± 186.681
2025-05-13 13:32:00,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [544.58, 477.06964, 562.15894, 80.37999, 597.45074, 353.23322, 553.53894, 530.0899, 84.14933, 551.43164]
2025-05-13 13:32:00,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 157.0, 175.0, 49.0, 204.0, 145.0, 174.0, 166.0, 48.0, 172.0]
2025-05-13 13:32:00,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 38 seconds)
2025-05-13 13:34:56,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:34:58,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 443.52261 ± 227.482
2025-05-13 13:34:58,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [609.30817, 303.82715, 500.69034, 564.6933, 11.89375, 38.528694, 640.88416, 598.16534, 576.96497, 590.2702]
2025-05-13 13:34:58,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 133.0, 234.0, 179.0, 16.0, 43.0, 240.0, 190.0, 192.0, 211.0]
2025-05-13 13:34:58,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 29 minutes, 42 seconds)
2025-05-13 13:37:54,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:37:56,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 381.12750 ± 253.680
2025-05-13 13:37:56,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [563.2483, 172.44435, 660.42834, 41.368725, 543.9731, 565.0582, 579.48096, 59.1654, 29.544174, 596.5632]
2025-05-13 13:37:56,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 80.0, 213.0, 59.0, 179.0, 198.0, 182.0, 39.0, 32.0, 202.0]
2025-05-13 13:37:56,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 40 seconds)
2025-05-13 13:40:53,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:40:55,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 508.11963 ± 99.716
2025-05-13 13:40:55,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [288.033, 575.2691, 586.589, 608.1356, 366.79358, 567.9634, 496.5713, 498.3446, 588.91864, 504.5781]
2025-05-13 13:40:55,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 179.0, 184.0, 213.0, 129.0, 180.0, 158.0, 159.0, 183.0, 159.0]
2025-05-13 13:40:55,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 47 seconds)
2025-05-13 13:43:52,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:43:54,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 361.99579 ± 198.697
2025-05-13 13:43:54,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [612.9547, 528.3092, 89.49207, 274.9209, 270.3656, 674.91583, 533.06714, 169.36343, 142.44812, 324.12082]
2025-05-13 13:43:54,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 167.0, 66.0, 111.0, 122.0, 230.0, 167.0, 79.0, 89.0, 136.0]
2025-05-13 13:43:54,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 46 seconds)
2025-05-13 13:46:49,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:46:51,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 479.32031 ± 164.952
2025-05-13 13:46:51,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [527.49677, 543.88635, 563.0921, 581.9448, 396.4199, 613.2651, 222.77815, 641.68506, 129.62683, 573.0082]
2025-05-13 13:46:51,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 171.0, 177.0, 183.0, 147.0, 222.0, 97.0, 228.0, 64.0, 178.0]
2025-05-13 13:46:51,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 49 seconds)
2025-05-13 13:49:47,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:49:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 395.22699 ± 228.729
2025-05-13 13:49:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [448.55106, 198.9527, 52.13648, 626.07837, 379.9649, 689.0657, 609.0433, 142.48387, 163.71918, 642.2743]
2025-05-13 13:49:49,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 92.0, 44.0, 211.0, 135.0, 223.0, 188.0, 70.0, 86.0, 205.0]
2025-05-13 13:49:49,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 50 seconds)
2025-05-13 13:52:46,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:52:48,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 434.02280 ± 149.353
2025-05-13 13:52:48,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [557.17145, 580.79987, 287.8662, 633.2275, 212.04358, 289.4204, 565.13336, 522.9349, 430.86743, 260.76328]
2025-05-13 13:52:48,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 184.0, 111.0, 216.0, 92.0, 120.0, 195.0, 165.0, 149.0, 105.0]
2025-05-13 13:52:48,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 53 seconds)
2025-05-13 13:55:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:55:47,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 417.86499 ± 190.462
2025-05-13 13:55:47,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [614.8117, 535.9506, 272.86447, 34.922653, 579.88605, 228.28236, 530.30505, 538.72424, 584.06525, 258.83777]
2025-05-13 13:55:47,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [218.0, 169.0, 108.0, 37.0, 183.0, 97.0, 167.0, 170.0, 184.0, 104.0]
2025-05-13 13:55:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 54 seconds)
2025-05-13 13:58:45,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:58:47,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 355.28278 ± 212.387
2025-05-13 13:58:47,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [622.7647, 214.24641, 567.8339, 72.4783, 607.565, 212.11563, 277.72867, 571.2718, 363.95804, 42.86555]
2025-05-13 13:58:47,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 96.0, 191.0, 46.0, 204.0, 92.0, 111.0, 197.0, 137.0, 45.0]
2025-05-13 13:58:47,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 57 seconds)
2025-05-13 14:01:48,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:01:50,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 338.07480 ± 250.154
2025-05-13 14:01:50,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [162.75543, 408.93106, 621.3825, 566.58875, 695.46893, 585.58624, 180.68091, 59.5909, 57.05216, 42.71083]
2025-05-13 14:01:50,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 152.0, 217.0, 181.0, 274.0, 181.0, 83.0, 65.0, 35.0, 50.0]
2025-05-13 14:01:50,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 59 seconds)
2025-05-13 14:04:51,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:04:52,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 448.21255 ± 165.096
2025-05-13 14:04:52,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [265.06378, 478.8805, 449.73407, 520.00134, 65.777245, 630.2668, 468.54373, 392.53195, 620.3762, 590.9498]
2025-05-13 14:04:52,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 170.0, 150.0, 168.0, 43.0, 216.0, 160.0, 137.0, 194.0, 183.0]
2025-05-13 14:04:52,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1251 [DEBUG]: Training session finished
