2025-05-13 09:06:26,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mda-mem4
2025-05-13 09:06:26,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-bpql-mda-mem4
2025-05-13 09:06:26,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154ebde09290>}
2025-05-13 09:06:26,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:26,259 baseline-bpql-mda-noisy-hopper:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-13 09:06:26,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-13 09:06:26,284 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-13 09:06:26,284 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:26,289 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2025-05-13 09:06:27,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:27,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-13 09:09:31,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:09:32,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 73.97131 ± 7.748
2025-05-13 09:09:32,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [70.629616, 67.62699, 96.02079, 73.06069, 71.09029, 75.181915, 68.92041, 75.12673, 73.01648, 69.039185]
2025-05-13 09:09:32,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 36.0, 50.0, 39.0, 38.0, 40.0, 37.0, 40.0, 39.0, 37.0]
2025-05-13 09:09:32,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (73.97) for latency ExtremeSparseL4U32
2025-05-13 09:09:32,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 5 minutes)
2025-05-13 09:12:45,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:12:47,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 45.87561 ± 36.276
2025-05-13 09:12:47,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [29.337172, 27.039783, 129.08081, 25.690504, 28.59443, 26.0328, 27.347803, 26.71669, 32.813225, 106.102844]
2025-05-13 09:12:47,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 52.0, 139.0, 52.0, 54.0, 51.0, 54.0, 54.0, 57.0, 255.0]
2025-05-13 09:12:47,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 10 minutes, 11 seconds)
2025-05-13 09:16:02,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:16:04,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 213.22441 ± 94.670
2025-05-13 09:16:04,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [341.07062, 185.3443, 261.59225, 141.37608, 246.2738, 255.88338, 116.698944, 137.67407, 376.242, 70.0889]
2025-05-13 09:16:04,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 131.0, 172.0, 103.0, 189.0, 193.0, 81.0, 93.0, 232.0, 64.0]
2025-05-13 09:16:04,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (213.22) for latency ExtremeSparseL4U32
2025-05-13 09:16:04,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 11 minutes, 11 seconds)
2025-05-13 09:19:17,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:19:19,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 138.51089 ± 54.486
2025-05-13 09:19:19,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [33.03221, 164.70952, 149.65959, 61.394558, 198.77718, 226.98262, 144.59679, 139.93175, 121.9404, 144.08427]
2025-05-13 09:19:19,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 130.0, 197.0, 101.0, 245.0, 215.0, 159.0, 158.0, 132.0, 119.0]
2025-05-13 09:19:19,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 8 minutes, 51 seconds)
2025-05-13 09:22:45,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:22:46,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 113.02643 ± 27.552
2025-05-13 09:22:46,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [125.44544, 65.36691, 139.0474, 123.572266, 111.40198, 161.03473, 122.818695, 72.58415, 94.67462, 114.31808]
2025-05-13 09:22:46,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 58.0, 122.0, 107.0, 98.0, 142.0, 111.0, 66.0, 88.0, 102.0]
2025-05-13 09:22:46,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 10 minutes, 9 seconds)
2025-05-13 09:25:49,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:25:52,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 279.64807 ± 155.473
2025-05-13 09:25:52,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [502.05356, 235.01286, 241.22594, 113.31544, 476.35202, 111.99443, 248.19858, 40.44131, 409.50793, 418.37872]
2025-05-13 09:25:52,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [322.0, 159.0, 184.0, 80.0, 298.0, 73.0, 185.0, 49.0, 234.0, 240.0]
2025-05-13 09:25:52,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (279.65) for latency ExtremeSparseL4U32
2025-05-13 09:25:52,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 7 minutes, 4 seconds)
2025-05-13 09:29:06,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:29:07,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 137.24002 ± 40.245
2025-05-13 09:29:07,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [184.48172, 160.1788, 148.27155, 132.6773, 100.06212, 36.911945, 177.43338, 140.7383, 139.76433, 151.88089]
2025-05-13 09:29:07,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 131.0, 120.0, 97.0, 65.0, 40.0, 151.0, 109.0, 108.0, 123.0]
2025-05-13 09:29:07,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 3 minutes, 56 seconds)
2025-05-13 09:32:18,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:32:20,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 156.91238 ± 104.302
2025-05-13 09:32:20,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [37.95463, 29.893274, 224.97719, 199.42674, 220.9087, 312.40036, 45.126312, 166.2299, 41.52381, 290.6828]
2025-05-13 09:32:20,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 34.0, 145.0, 156.0, 179.0, 142.0, 40.0, 107.0, 30.0, 242.0]
2025-05-13 09:32:20,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 59 minutes, 10 seconds)
2025-05-13 09:35:34,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:35:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 219.11856 ± 75.874
2025-05-13 09:35:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [237.81152, 211.81395, 254.89601, 261.30798, 106.65166, 43.198826, 256.14496, 268.02496, 262.43448, 288.90137]
2025-05-13 09:35:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 97.0, 110.0, 117.0, 61.0, 42.0, 130.0, 122.0, 113.0, 123.0]
2025-05-13 09:35:35,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 56 minutes, 1 second)
2025-05-13 09:38:44,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:38:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 266.58157 ± 43.684
2025-05-13 09:38:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [279.26566, 255.18109, 300.2994, 274.94553, 312.26904, 229.77032, 159.12383, 313.09232, 256.89557, 284.97287]
2025-05-13 09:38:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 114.0, 132.0, 126.0, 146.0, 111.0, 109.0, 142.0, 120.0, 172.0]
2025-05-13 09:38:46,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 47 minutes, 59 seconds)
2025-05-13 09:42:02,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:42:04,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 302.92505 ± 91.945
2025-05-13 09:42:04,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [182.15184, 388.25092, 357.09894, 183.80423, 129.3539, 375.03247, 355.67926, 349.0686, 356.3837, 352.4265]
2025-05-13 09:42:04,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 204.0, 168.0, 99.0, 83.0, 190.0, 165.0, 164.0, 164.0, 163.0]
2025-05-13 09:42:04,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (302.93) for latency ExtremeSparseL4U32
2025-05-13 09:42:04,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 48 minutes, 29 seconds)
2025-05-13 09:45:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:45:16,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 318.16925 ± 78.224
2025-05-13 09:45:16,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [255.3791, 240.99077, 368.9977, 375.29135, 391.32413, 138.70781, 339.52484, 389.36835, 315.43353, 366.67493]
2025-05-13 09:45:16,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 120.0, 189.0, 181.0, 200.0, 79.0, 195.0, 196.0, 182.0, 172.0]
2025-05-13 09:45:16,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (318.17) for latency ExtremeSparseL4U32
2025-05-13 09:45:16,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 44 minutes, 18 seconds)
2025-05-13 09:48:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:48:31,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 214.64346 ± 146.523
2025-05-13 09:48:31,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [125.2661, 61.45047, 103.12975, 67.78233, 455.30664, 95.781975, 286.16855, 479.12753, 260.53937, 211.88176]
2025-05-13 09:48:31,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 65.0, 78.0, 69.0, 268.0, 64.0, 185.0, 291.0, 163.0, 155.0]
2025-05-13 09:48:31,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 41 minutes, 36 seconds)
2025-05-13 09:51:40,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:51:42,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 293.10852 ± 107.715
2025-05-13 09:51:42,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [361.20566, 173.69551, 389.39658, 362.21268, 352.42023, 372.63495, 317.64243, 51.219444, 193.1859, 357.4719]
2025-05-13 09:51:42,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 97.0, 203.0, 178.0, 167.0, 189.0, 149.0, 51.0, 115.0, 174.0]
2025-05-13 09:51:42,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 37 minutes, 19 seconds)
2025-05-13 09:54:53,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:54:55,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 259.11188 ± 125.162
2025-05-13 09:54:55,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [61.782703, 382.44418, 364.18628, 303.16687, 352.82056, 351.70468, 141.13087, 241.88426, 37.908016, 354.0902]
2025-05-13 09:54:55,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 206.0, 179.0, 179.0, 164.0, 168.0, 82.0, 134.0, 44.0, 169.0]
2025-05-13 09:54:55,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 34 minutes, 27 seconds)
2025-05-13 09:58:10,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:58:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 180.39005 ± 147.841
2025-05-13 09:58:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [37.719402, 207.22096, 375.7668, 345.46747, 50.11161, 31.93272, 20.38591, 53.800064, 339.10675, 342.38864]
2025-05-13 09:58:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 109.0, 195.0, 152.0, 30.0, 37.0, 26.0, 61.0, 148.0, 150.0]
2025-05-13 09:58:11,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 30 minutes, 46 seconds)
2025-05-13 10:01:22,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:01:24,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 342.94537 ± 10.882
2025-05-13 10:01:24,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [346.21307, 340.79932, 342.3377, 335.90765, 345.82178, 316.76428, 362.4173, 349.02164, 345.05698, 345.11432]
2025-05-13 10:01:24,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [154.0, 149.0, 149.0, 150.0, 155.0, 140.0, 174.0, 157.0, 151.0, 151.0]
2025-05-13 10:01:24,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (342.95) for latency ExtremeSparseL4U32
2025-05-13 10:01:24,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 27 minutes, 41 seconds)
2025-05-13 10:04:35,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:04:37,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 330.85928 ± 108.991
2025-05-13 10:04:37,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [373.216, 384.66464, 277.9552, 45.04262, 452.41345, 366.6648, 394.42776, 259.13458, 375.13245, 379.94135]
2025-05-13 10:04:37,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 187.0, 150.0, 42.0, 260.0, 173.0, 197.0, 149.0, 184.0, 186.0]
2025-05-13 10:04:37,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 24 minutes, 3 seconds)
2025-05-13 10:07:51,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:07:53,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 290.45786 ± 116.780
2025-05-13 10:07:53,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [141.62758, 363.50925, 360.72586, 89.82742, 369.84784, 239.96063, 466.92154, 340.42316, 165.86815, 365.86728]
2025-05-13 10:07:53,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 173.0, 171.0, 55.0, 177.0, 125.0, 277.0, 155.0, 94.0, 175.0]
2025-05-13 10:07:53,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 22 minutes)
2025-05-13 10:11:04,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:11:05,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 234.09387 ± 152.533
2025-05-13 10:11:05,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [381.50522, 360.98825, 340.82785, 301.2514, 26.356222, 378.4764, 81.75557, 26.88447, 65.779335, 377.11395]
2025-05-13 10:11:05,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 161.0, 156.0, 142.0, 32.0, 191.0, 49.0, 32.0, 83.0, 173.0]
2025-05-13 10:11:05,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 18 minutes, 47 seconds)
2025-05-13 10:14:20,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:14:22,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 262.66547 ± 105.950
2025-05-13 10:14:22,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [289.5246, 353.1205, 148.31891, 246.95384, 171.02095, 335.17938, 330.68127, 33.510128, 341.76352, 376.5816]
2025-05-13 10:14:22,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 181.0, 82.0, 113.0, 111.0, 154.0, 154.0, 37.0, 158.0, 178.0]
2025-05-13 10:14:22,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 15 minutes, 41 seconds)
2025-05-13 10:17:32,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:17:34,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 229.68958 ± 147.430
2025-05-13 10:17:34,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [355.91245, 341.94583, 110.69739, 47.296436, 45.323357, 220.30447, 37.78365, 426.6494, 360.21878, 350.76398]
2025-05-13 10:17:34,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 161.0, 60.0, 51.0, 54.0, 122.0, 39.0, 282.0, 172.0, 160.0]
2025-05-13 10:17:34,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 12 minutes, 15 seconds)
2025-05-13 10:20:45,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:20:47,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 291.52292 ± 126.155
2025-05-13 10:20:47,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [193.24619, 463.48373, 363.41864, 225.78094, 406.87274, 182.13173, 357.12668, 65.68838, 445.11035, 212.36983]
2025-05-13 10:20:47,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 312.0, 168.0, 120.0, 218.0, 100.0, 163.0, 40.0, 269.0, 121.0]
2025-05-13 10:20:47,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 8 minutes, 57 seconds)
2025-05-13 10:24:00,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:24:02,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 311.28772 ± 96.121
2025-05-13 10:24:02,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [366.78632, 69.98139, 314.09543, 370.02322, 190.44647, 380.359, 358.93124, 337.6635, 368.7473, 355.84317]
2025-05-13 10:24:02,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 46.0, 155.0, 177.0, 106.0, 225.0, 169.0, 154.0, 181.0, 164.0]
2025-05-13 10:24:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 5 minutes, 34 seconds)
2025-05-13 10:27:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:27:14,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 233.64758 ± 157.574
2025-05-13 10:27:14,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [425.82956, 351.7565, 125.76051, 367.8126, 57.95126, 395.29016, 63.56866, 33.131725, 399.964, 115.41098]
2025-05-13 10:27:14,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [237.0, 156.0, 73.0, 208.0, 68.0, 233.0, 39.0, 39.0, 207.0, 68.0]
2025-05-13 10:27:14,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 2 minutes, 14 seconds)
2025-05-13 10:30:28,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:30:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 301.40002 ± 122.385
2025-05-13 10:30:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [383.71396, 377.7667, 348.13843, 348.36078, 80.94026, 48.194935, 360.0953, 293.8903, 418.8568, 354.04263]
2025-05-13 10:30:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 186.0, 154.0, 153.0, 53.0, 60.0, 156.0, 142.0, 223.0, 158.0]
2025-05-13 10:30:30,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 58 minutes, 46 seconds)
2025-05-13 10:33:43,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:33:45,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 274.28851 ± 117.097
2025-05-13 10:33:45,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [348.43005, 270.39786, 253.37163, 360.5917, 372.60748, 383.7686, 78.49867, 46.163662, 379.5429, 249.51236]
2025-05-13 10:33:45,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 158.0, 127.0, 169.0, 177.0, 190.0, 52.0, 28.0, 195.0, 127.0]
2025-05-13 10:33:45,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 56 minutes, 10 seconds)
2025-05-13 10:36:54,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:36:56,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 268.27441 ± 86.334
2025-05-13 10:36:56,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [195.76743, 116.8935, 349.1612, 342.91977, 229.04398, 397.04007, 302.53903, 185.93964, 219.21997, 344.2196]
2025-05-13 10:36:56,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 76.0, 154.0, 151.0, 113.0, 205.0, 140.0, 109.0, 173.0, 150.0]
2025-05-13 10:36:56,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 52 minutes, 32 seconds)
2025-05-13 10:40:08,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:40:09,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 245.92155 ± 145.574
2025-05-13 10:40:09,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [392.74796, 177.82228, 33.36353, 56.44291, 356.0788, 32.59552, 357.33124, 321.24155, 356.54617, 375.04565]
2025-05-13 10:40:09,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 106.0, 34.0, 54.0, 162.0, 52.0, 172.0, 152.0, 163.0, 182.0]
2025-05-13 10:40:09,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 48 minutes, 56 seconds)
2025-05-13 10:43:21,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:43:23,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 206.08276 ± 147.041
2025-05-13 10:43:23,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [37.267612, 110.29977, 64.90094, 36.837475, 57.960064, 332.221, 334.55304, 369.49805, 324.35687, 392.93283]
2025-05-13 10:43:23,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 81.0, 70.0, 38.0, 57.0, 139.0, 140.0, 172.0, 133.0, 190.0]
2025-05-13 10:43:23,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 45 minutes, 59 seconds)
2025-05-13 10:46:36,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:46:38,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 282.64044 ± 110.139
2025-05-13 10:46:38,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [349.6014, 345.0393, 339.24857, 353.4486, 110.7964, 341.01578, 250.64566, 351.11774, 34.792877, 350.69803]
2025-05-13 10:46:38,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 150.0, 138.0, 157.0, 62.0, 150.0, 113.0, 155.0, 37.0, 156.0]
2025-05-13 10:46:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 42 minutes, 38 seconds)
2025-05-13 10:49:51,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:49:53,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 238.55904 ± 132.634
2025-05-13 10:49:53,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [311.39328, 67.55752, 323.0555, 143.86333, 380.27982, 77.53402, 35.734974, 380.23764, 322.60745, 343.32675]
2025-05-13 10:49:53,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 45.0, 196.0, 106.0, 221.0, 50.0, 38.0, 188.0, 130.0, 148.0]
2025-05-13 10:49:53,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 39 minutes, 24 seconds)
2025-05-13 10:53:03,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:53:04,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 297.79019 ± 90.585
2025-05-13 10:53:04,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [334.90637, 331.14447, 332.5329, 323.181, 329.31247, 313.5509, 27.038061, 316.21686, 329.73215, 340.28677]
2025-05-13 10:53:04,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 141.0, 140.0, 134.0, 137.0, 137.0, 23.0, 176.0, 138.0, 149.0]
2025-05-13 10:53:04,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 36 minutes, 19 seconds)
2025-05-13 10:56:18,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:56:20,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 185.38330 ± 122.568
2025-05-13 10:56:20,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [62.5493, 172.14923, 402.39865, 347.0325, 189.20863, 37.710415, 135.72136, 306.8056, 164.04494, 36.212353]
2025-05-13 10:56:20,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 104.0, 195.0, 155.0, 106.0, 50.0, 85.0, 149.0, 93.0, 43.0]
2025-05-13 10:56:20,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 33 minutes, 26 seconds)
2025-05-13 10:59:31,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:59:33,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 322.80527 ± 77.795
2025-05-13 10:59:33,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [363.09802, 332.76254, 349.45642, 353.05603, 314.734, 351.81866, 354.3024, 342.3208, 93.87377, 372.63004]
2025-05-13 10:59:33,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 139.0, 143.0, 147.0, 132.0, 141.0, 140.0, 139.0, 56.0, 173.0]
2025-05-13 10:59:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 30 minutes, 16 seconds)
2025-05-13 11:02:44,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:02:45,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 127.70325 ± 113.261
2025-05-13 11:02:45,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [35.195526, 138.12509, 348.65335, 343.59805, 56.10084, 82.03538, 107.2543, 79.0049, 46.003887, 41.061153]
2025-05-13 11:02:45,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 95.0, 159.0, 149.0, 66.0, 51.0, 70.0, 55.0, 45.0, 48.0]
2025-05-13 11:02:45,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 26 minutes, 17 seconds)
2025-05-13 11:05:58,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:05:59,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 223.19266 ± 92.734
2025-05-13 11:05:59,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [335.69858, 337.28046, 203.7393, 155.84804, 287.0038, 148.56583, 110.312675, 292.69052, 69.64167, 291.1455]
2025-05-13 11:05:59,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 146.0, 101.0, 82.0, 122.0, 81.0, 81.0, 120.0, 45.0, 159.0]
2025-05-13 11:05:59,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 22 minutes, 57 seconds)
2025-05-13 11:09:12,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:09:14,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 265.33911 ± 139.911
2025-05-13 11:09:14,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [383.0173, 333.71915, 380.49896, 342.2969, 270.65076, 36.601192, 371.42532, 390.13828, 119.999825, 25.043497]
2025-05-13 11:09:14,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 145.0, 184.0, 190.0, 143.0, 51.0, 171.0, 180.0, 69.0, 30.0]
2025-05-13 11:09:14,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 20 minutes, 20 seconds)
2025-05-13 11:12:24,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:12:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 277.47443 ± 121.471
2025-05-13 11:12:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [360.03137, 349.467, 103.14759, 408.98975, 342.5913, 355.36838, 153.52718, 309.85583, 350.53394, 41.23165]
2025-05-13 11:12:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 144.0, 59.0, 210.0, 141.0, 153.0, 76.0, 153.0, 145.0, 24.0]
2025-05-13 11:12:25,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 16 minutes, 24 seconds)
2025-05-13 11:15:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:15:40,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 269.14288 ± 109.858
2025-05-13 11:15:40,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [358.2298, 379.11594, 364.49435, 139.77692, 150.74098, 329.34122, 353.67975, 364.16193, 125.74227, 126.14584]
2025-05-13 11:15:40,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 178.0, 177.0, 113.0, 119.0, 155.0, 165.0, 172.0, 73.0, 74.0]
2025-05-13 11:15:40,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 13 minutes, 25 seconds)
2025-05-13 11:18:51,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:18:53,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 224.03691 ± 122.518
2025-05-13 11:18:53,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [341.777, 347.65234, 246.13962, 48.619194, 35.18735, 120.67547, 371.01007, 166.46848, 358.40536, 204.43442]
2025-05-13 11:18:53,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 151.0, 139.0, 57.0, 40.0, 71.0, 172.0, 94.0, 160.0, 131.0]
2025-05-13 11:18:53,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 10 minutes, 19 seconds)
2025-05-13 11:22:08,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:22:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 308.82266 ± 90.193
2025-05-13 11:22:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [361.3213, 355.01346, 354.0966, 327.5717, 367.6923, 349.60587, 344.75507, 176.12161, 91.27292, 360.77588]
2025-05-13 11:22:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 157.0, 158.0, 148.0, 138.0, 154.0, 152.0, 122.0, 71.0, 161.0]
2025-05-13 11:22:10,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 7 minutes, 44 seconds)
2025-05-13 11:25:22,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:25:23,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 250.50977 ± 131.552
2025-05-13 11:25:23,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [331.42334, 354.1759, 342.7428, 336.50507, 24.214132, 82.349205, 336.22433, 48.48154, 299.31006, 349.67126]
2025-05-13 11:25:23,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 152.0, 147.0, 140.0, 41.0, 51.0, 148.0, 61.0, 134.0, 157.0]
2025-05-13 11:25:23,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 4 minutes, 14 seconds)
2025-05-13 11:28:34,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:28:36,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 342.76419 ± 114.749
2025-05-13 11:28:36,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [369.79047, 30.750256, 396.61978, 271.07385, 442.5819, 425.99686, 353.95737, 353.93466, 434.268, 348.66867]
2025-05-13 11:28:36,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 32.0, 167.0, 137.0, 197.0, 177.0, 150.0, 149.0, 183.0, 151.0]
2025-05-13 11:28:36,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 1 minute, 6 seconds)
2025-05-13 11:31:48,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:31:50,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 376.86627 ± 72.330
2025-05-13 11:31:50,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [363.86813, 433.3219, 399.82248, 398.9489, 397.45517, 387.80267, 165.62416, 406.82236, 408.9424, 406.0546]
2025-05-13 11:31:50,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 211.0, 163.0, 161.0, 178.0, 164.0, 92.0, 163.0, 162.0, 163.0]
2025-05-13 11:31:50,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (376.87) for latency ExtremeSparseL4U32
2025-05-13 11:31:50,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 57 minutes, 42 seconds)
2025-05-13 11:35:04,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:35:06,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 271.09973 ± 120.517
2025-05-13 11:35:06,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [244.58473, 12.831951, 367.67853, 338.02585, 332.06912, 65.12693, 345.0612, 341.0742, 336.60287, 327.9422]
2025-05-13 11:35:06,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 15.0, 165.0, 145.0, 139.0, 40.0, 148.0, 151.0, 144.0, 137.0]
2025-05-13 11:35:06,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 55 minutes, 7 seconds)
2025-05-13 11:38:17,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:38:19,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 290.46030 ± 161.727
2025-05-13 11:38:19,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [428.99686, 436.46136, 433.24274, 35.617004, 382.9629, 377.5659, 229.36029, 435.10486, 68.716255, 76.574585]
2025-05-13 11:38:19,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 170.0, 164.0, 28.0, 158.0, 146.0, 112.0, 175.0, 42.0, 84.0]
2025-05-13 11:38:19,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 51 minutes, 7 seconds)
2025-05-13 11:41:30,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:41:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 324.51398 ± 125.864
2025-05-13 11:41:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [424.26004, 338.83914, 389.0412, 129.86073, 41.68347, 419.22333, 434.79385, 393.6401, 337.1911, 336.60678]
2025-05-13 11:41:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 144.0, 151.0, 67.0, 49.0, 171.0, 152.0, 145.0, 141.0, 138.0]
2025-05-13 11:41:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 47 minutes, 48 seconds)
2025-05-13 11:44:44,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:44:45,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 273.27887 ± 160.799
2025-05-13 11:44:45,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [30.651752, 474.25134, 322.69696, 71.60397, 334.5369, 325.01315, 514.49817, 353.60324, 242.892, 63.040916]
2025-05-13 11:44:45,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 184.0, 129.0, 43.0, 133.0, 129.0, 175.0, 137.0, 104.0, 56.0]
2025-05-13 11:44:45,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 44 minutes, 51 seconds)
2025-05-13 11:47:59,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:48:00,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 366.65442 ± 178.898
2025-05-13 11:48:00,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [459.30026, 506.3243, 529.2024, 455.97266, 24.302216, 456.95926, 474.121, 487.0882, 123.90681, 149.36723]
2025-05-13 11:48:00,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 185.0, 179.0, 160.0, 25.0, 199.0, 166.0, 169.0, 62.0, 158.0]
2025-05-13 11:48:00,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 41 minutes, 47 seconds)
2025-05-13 11:51:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:51:14,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 319.81241 ± 166.019
2025-05-13 11:51:14,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [209.40161, 522.7092, 315.5693, 511.75034, 515.36035, 71.76698, 73.74305, 382.77527, 397.8571, 197.19067]
2025-05-13 11:51:14,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 206.0, 137.0, 185.0, 188.0, 44.0, 49.0, 186.0, 158.0, 97.0]
2025-05-13 11:51:14,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 38 minutes, 9 seconds)
2025-05-13 11:54:26,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:54:28,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 294.89639 ± 186.495
2025-05-13 11:54:28,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [523.0346, 20.499926, 476.15402, 314.91547, 457.56226, 37.31117, 484.63492, 201.01242, 353.70544, 80.13381]
2025-05-13 11:54:28,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [189.0, 19.0, 174.0, 126.0, 176.0, 41.0, 202.0, 99.0, 140.0, 51.0]
2025-05-13 11:54:28,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 35 minutes, 2 seconds)
2025-05-13 11:57:42,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:57:44,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 399.58258 ± 183.442
2025-05-13 11:57:44,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [316.32193, 571.1763, 680.54486, 324.78568, 543.9738, 195.47864, 470.89856, 531.1571, 312.86984, 48.619007]
2025-05-13 11:57:44,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 203.0, 248.0, 137.0, 192.0, 95.0, 191.0, 196.0, 134.0, 34.0]
2025-05-13 11:57:44,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (399.58) for latency ExtremeSparseL4U32
2025-05-13 11:57:44,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 32 minutes, 18 seconds)
2025-05-13 12:00:58,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:01:02,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 701.81598 ± 376.337
2025-05-13 12:01:02,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [392.71494, 518.80444, 959.52875, 285.70416, 350.7447, 546.127, 1638.7648, 717.98035, 779.11774, 828.673]
2025-05-13 12:01:02,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 199.0, 326.0, 125.0, 146.0, 213.0, 595.0, 260.0, 263.0, 283.0]
2025-05-13 12:01:02,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (701.82) for latency ExtremeSparseL4U32
2025-05-13 12:01:02,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 29 minutes, 40 seconds)
2025-05-13 12:04:11,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:04:13,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 408.05057 ± 386.618
2025-05-13 12:04:13,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [545.5723, 271.098, 248.55559, 252.14124, 799.2763, 124.517685, 104.23867, 267.40536, 1385.579, 82.12124]
2025-05-13 12:04:13,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [225.0, 120.0, 117.0, 123.0, 279.0, 62.0, 85.0, 162.0, 522.0, 51.0]
2025-05-13 12:04:13,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 25 minutes, 54 seconds)
2025-05-13 12:07:28,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:07:31,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 435.17227 ± 256.976
2025-05-13 12:07:31,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [379.48346, 292.97818, 796.67236, 15.341963, 391.7929, 784.4105, 611.03955, 459.33325, 42.40111, 578.26917]
2025-05-13 12:07:31,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 121.0, 262.0, 17.0, 165.0, 317.0, 254.0, 181.0, 55.0, 204.0]
2025-05-13 12:07:31,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 23 minutes, 13 seconds)
2025-05-13 12:10:44,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:10:46,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 417.86273 ± 261.246
2025-05-13 12:10:46,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [68.05788, 69.47548, 348.73587, 580.6985, 558.8107, 593.59814, 240.0248, 624.90656, 194.4668, 899.85254]
2025-05-13 12:10:46,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 53.0, 139.0, 204.0, 187.0, 203.0, 111.0, 253.0, 89.0, 340.0]
2025-05-13 12:10:46,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 20 minutes, 15 seconds)
2025-05-13 12:13:56,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:13:58,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 573.63672 ± 86.254
2025-05-13 12:13:58,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [653.68207, 426.39252, 559.2578, 722.63904, 580.1856, 486.52594, 498.79434, 660.061, 534.4317, 614.39764]
2025-05-13 12:13:58,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 201.0, 191.0, 259.0, 209.0, 179.0, 181.0, 222.0, 195.0, 219.0]
2025-05-13 12:13:58,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 16 minutes, 26 seconds)
2025-05-13 12:17:11,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:17:12,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 213.16336 ± 161.153
2025-05-13 12:17:12,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [496.14355, 43.380466, 57.00781, 29.344856, 389.7362, 92.297386, 256.38416, 416.74332, 157.15787, 193.43803]
2025-05-13 12:17:12,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 26.0, 67.0, 33.0, 155.0, 55.0, 131.0, 168.0, 91.0, 105.0]
2025-05-13 12:17:12,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 12 minutes, 40 seconds)
2025-05-13 12:20:23,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:20:24,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 297.57208 ± 293.058
2025-05-13 12:20:24,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [713.0114, 704.12006, 11.6980095, 57.544937, 53.61206, 37.50447, 340.79724, 302.43106, 31.233389, 723.76807]
2025-05-13 12:20:24,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 265.0, 13.0, 58.0, 55.0, 53.0, 167.0, 169.0, 40.0, 244.0]
2025-05-13 12:20:24,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 9 minutes, 30 seconds)
2025-05-13 12:23:37,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:23:40,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 797.48535 ± 442.751
2025-05-13 12:23:40,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [557.4734, 799.7383, 825.37964, 163.96143, 1265.9559, 49.138515, 1335.0294, 1449.1843, 815.55634, 713.4358]
2025-05-13 12:23:40,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 258.0, 261.0, 93.0, 516.0, 58.0, 474.0, 475.0, 261.0, 245.0]
2025-05-13 12:23:40,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (797.49) for latency ExtremeSparseL4U32
2025-05-13 12:23:40,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 6 minutes, 4 seconds)
2025-05-13 12:26:53,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:26:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 374.14233 ± 312.838
2025-05-13 12:26:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [686.9923, 127.28395, 778.6512, 62.847492, 36.528015, 628.6522, 688.35095, 46.38104, 49.219692, 636.5167]
2025-05-13 12:26:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 63.0, 268.0, 41.0, 41.0, 249.0, 244.0, 33.0, 64.0, 252.0]
2025-05-13 12:26:55,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 2 minutes, 39 seconds)
2025-05-13 12:30:10,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:30:13,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 694.30286 ± 377.218
2025-05-13 12:30:13,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [788.54297, 577.6975, 75.76478, 691.5238, 928.7326, 47.66666, 780.2201, 782.7405, 1400.5995, 869.54016]
2025-05-13 12:30:13,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 222.0, 57.0, 274.0, 317.0, 45.0, 253.0, 257.0, 500.0, 276.0]
2025-05-13 12:30:13,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 11 seconds)
2025-05-13 12:33:28,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:33:32,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 759.63049 ± 252.918
2025-05-13 12:33:32,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [796.3169, 852.2895, 931.4373, 50.876392, 774.7295, 876.7003, 968.77075, 941.318, 685.3448, 718.5217]
2025-05-13 12:33:32,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [277.0, 316.0, 346.0, 53.0, 301.0, 307.0, 385.0, 327.0, 263.0, 294.0]
2025-05-13 12:33:32,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 57 minutes, 31 seconds)
2025-05-13 12:36:44,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:36:47,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 501.30118 ± 376.590
2025-05-13 12:36:47,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [931.9996, 858.40283, 945.4639, 723.9698, 47.439888, 38.93172, 68.554085, 851.46344, 384.69028, 162.09595]
2025-05-13 12:36:47,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [304.0, 313.0, 335.0, 241.0, 53.0, 38.0, 75.0, 287.0, 156.0, 80.0]
2025-05-13 12:36:47,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 54 minutes, 36 seconds)
2025-05-13 12:39:55,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:39:59,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 837.75372 ± 506.288
2025-05-13 12:39:59,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [45.212097, 372.59036, 1099.2922, 92.41018, 1394.3269, 815.67017, 873.78345, 862.53796, 1160.0922, 1661.6218]
2025-05-13 12:39:59,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 156.0, 361.0, 55.0, 486.0, 311.0, 312.0, 287.0, 388.0, 584.0]
2025-05-13 12:39:59,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (837.75) for latency ExtremeSparseL4U32
2025-05-13 12:39:59,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 50 minutes, 56 seconds)
2025-05-13 12:43:12,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:43:14,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 436.49130 ± 434.565
2025-05-13 12:43:14,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [96.48904, 1078.0759, 220.93062, 1130.0142, 45.56499, 43.286755, 503.5397, 239.4697, 16.94641, 990.596]
2025-05-13 12:43:14,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 423.0, 104.0, 438.0, 32.0, 37.0, 211.0, 134.0, 21.0, 347.0]
2025-05-13 12:43:14,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 47 minutes, 43 seconds)
2025-05-13 12:46:28,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:46:30,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 510.27676 ± 458.828
2025-05-13 12:46:30,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1131.8464, 37.767735, 41.684593, 92.042404, 152.98997, 84.99281, 1108.2096, 1118.597, 578.7264, 755.91064]
2025-05-13 12:46:30,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [358.0, 41.0, 52.0, 72.0, 79.0, 57.0, 371.0, 357.0, 217.0, 272.0]
2025-05-13 12:46:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 44 minutes, 14 seconds)
2025-05-13 12:49:44,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:49:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 595.67657 ± 315.309
2025-05-13 12:49:46,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [890.3633, 858.76404, 571.8762, 815.4605, 738.0529, 83.32787, 150.01213, 875.88727, 807.08124, 165.94034]
2025-05-13 12:49:46,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [283.0, 289.0, 204.0, 255.0, 250.0, 59.0, 73.0, 298.0, 253.0, 80.0]
2025-05-13 12:49:46,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 40 minutes, 40 seconds)
2025-05-13 12:53:00,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:53:03,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 658.83337 ± 393.029
2025-05-13 12:53:03,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [909.9638, 1009.3708, 1040.6799, 294.7978, 54.033703, 28.312668, 911.65326, 1024.4723, 413.08994, 901.95935]
2025-05-13 12:53:03,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [295.0, 311.0, 314.0, 122.0, 64.0, 34.0, 294.0, 318.0, 158.0, 289.0]
2025-05-13 12:53:03,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 37 minutes, 35 seconds)
2025-05-13 12:56:18,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:56:22,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 901.65369 ± 621.046
2025-05-13 12:56:22,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1411.7386, 298.31598, 375.02304, 1500.0996, 1545.211, 286.1459, 439.60776, 2027.5204, 788.3467, 344.52774]
2025-05-13 12:56:22,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [441.0, 124.0, 147.0, 459.0, 478.0, 123.0, 177.0, 628.0, 266.0, 172.0]
2025-05-13 12:56:22,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (901.65) for latency ExtremeSparseL4U32
2025-05-13 12:56:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 34 minutes, 56 seconds)
2025-05-13 12:59:29,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:59:32,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 695.25122 ± 508.714
2025-05-13 12:59:32,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [259.38416, 1624.9885, 259.71936, 398.06873, 1214.1299, 515.4619, 1152.8973, 178.84416, 186.15909, 1162.859]
2025-05-13 12:59:32,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 555.0, 124.0, 154.0, 418.0, 189.0, 363.0, 97.0, 96.0, 371.0]
2025-05-13 12:59:32,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 31 minutes, 15 seconds)
2025-05-13 13:02:46,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:02:50,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 896.92920 ± 661.781
2025-05-13 13:02:50,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1199.6482, 852.527, 1286.0741, 564.31976, 345.16617, 278.9392, 285.86453, 1894.3845, 2095.405, 166.96312]
2025-05-13 13:02:50,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [400.0, 285.0, 403.0, 208.0, 145.0, 146.0, 144.0, 589.0, 643.0, 79.0]
2025-05-13 13:02:50,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 28 minutes, 10 seconds)
2025-05-13 13:06:01,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:06:05,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 853.94641 ± 642.856
2025-05-13 13:06:05,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [418.51028, 1642.3943, 269.37378, 728.47156, 21.401045, 957.259, 1137.7422, 1996.2944, 44.928345, 1323.089]
2025-05-13 13:06:05,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [212.0, 585.0, 124.0, 237.0, 20.0, 305.0, 349.0, 711.0, 51.0, 446.0]
2025-05-13 13:06:05,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 24 minutes, 49 seconds)
2025-05-13 13:09:20,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:09:23,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 633.89746 ± 444.618
2025-05-13 13:09:23,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [768.86414, 1211.6394, 426.46823, 403.7783, 50.2678, 32.537586, 897.69763, 924.0182, 259.80368, 1363.8993]
2025-05-13 13:09:23,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [245.0, 384.0, 157.0, 162.0, 53.0, 38.0, 318.0, 331.0, 136.0, 445.0]
2025-05-13 13:09:23,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 21 minutes, 41 seconds)
2025-05-13 13:12:33,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:12:38,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1021.76886 ± 895.746
2025-05-13 13:12:38,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [475.16446, 132.03625, 523.2581, 1790.718, 1096.629, 648.48035, 33.99292, 2738.816, 2325.875, 452.71918]
2025-05-13 13:12:38,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 68.0, 226.0, 680.0, 421.0, 262.0, 38.0, 1000.0, 848.0, 184.0]
2025-05-13 13:12:38,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1021.77) for latency ExtremeSparseL4U32
2025-05-13 13:12:38,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 18 minutes, 6 seconds)
2025-05-13 13:15:51,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:15:55,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 856.24628 ± 630.219
2025-05-13 13:15:55,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1979.715, 288.298, 892.9721, 378.5176, 303.78497, 1949.5956, 1134.0066, 561.26166, 917.2914, 157.02092]
2025-05-13 13:15:55,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [606.0, 123.0, 288.0, 151.0, 154.0, 591.0, 350.0, 210.0, 310.0, 76.0]
2025-05-13 13:15:55,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 15 minutes, 23 seconds)
2025-05-13 13:19:07,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:19:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1131.49109 ± 379.880
2025-05-13 13:19:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1127.1648, 1410.0746, 1756.9097, 1375.2289, 1382.2733, 1133.4197, 1290.1951, 524.79803, 589.3665, 725.4805]
2025-05-13 13:19:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [353.0, 443.0, 550.0, 429.0, 428.0, 363.0, 404.0, 199.0, 210.0, 258.0]
2025-05-13 13:19:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1131.49) for latency ExtremeSparseL4U32
2025-05-13 13:19:12,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 12 minutes)
2025-05-13 13:22:24,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:22:27,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 546.30035 ± 365.502
2025-05-13 13:22:27,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [114.47914, 680.0752, 1186.3839, 308.57043, 21.028223, 871.6768, 916.95026, 735.27704, 351.60663, 276.95572]
2025-05-13 13:22:27,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 228.0, 391.0, 135.0, 23.0, 365.0, 355.0, 283.0, 154.0, 139.0]
2025-05-13 13:22:27,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 8 minutes, 45 seconds)
2025-05-13 13:25:39,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:25:42,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 658.56024 ± 475.843
2025-05-13 13:25:42,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1045.8969, 935.85065, 346.68613, 85.462204, 892.8393, 89.76711, 601.27954, 1129.7352, 1435.4442, 22.641327]
2025-05-13 13:25:42,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [356.0, 324.0, 142.0, 65.0, 290.0, 84.0, 256.0, 360.0, 449.0, 21.0]
2025-05-13 13:25:42,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 5 minutes, 16 seconds)
2025-05-13 13:28:57,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:29:00,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 677.80554 ± 480.067
2025-05-13 13:29:00,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [851.42816, 251.07947, 119.56088, 1187.3499, 1698.2101, 358.88525, 230.09294, 814.17755, 355.02145, 912.24945]
2025-05-13 13:29:00,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [312.0, 111.0, 79.0, 380.0, 516.0, 148.0, 102.0, 344.0, 168.0, 285.0]
2025-05-13 13:29:00,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 2 minutes, 13 seconds)
2025-05-13 13:32:10,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:32:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 346.73260 ± 332.546
2025-05-13 13:32:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [294.949, 1071.7902, 591.71185, 77.02226, 787.44055, 69.39079, 192.69676, 188.44896, 97.82542, 96.05055]
2025-05-13 13:32:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 415.0, 258.0, 53.0, 340.0, 70.0, 95.0, 98.0, 97.0, 79.0]
2025-05-13 13:32:12,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 58 minutes, 38 seconds)
2025-05-13 13:35:25,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:35:29,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 837.04022 ± 539.608
2025-05-13 13:35:29,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [713.43854, 831.09985, 1414.085, 331.7122, 1760.1746, 25.90343, 1196.068, 844.6107, 1165.4099, 87.900635]
2025-05-13 13:35:29,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 288.0, 447.0, 148.0, 558.0, 31.0, 403.0, 264.0, 361.0, 53.0]
2025-05-13 13:35:29,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 55 minutes, 21 seconds)
2025-05-13 13:38:40,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:38:41,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 215.36707 ± 236.930
2025-05-13 13:38:41,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [679.9795, 30.734526, 53.45365, 102.868095, 29.185614, 432.41016, 578.0535, 140.1405, 29.867676, 76.977356]
2025-05-13 13:38:41,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [217.0, 35.0, 37.0, 79.0, 36.0, 179.0, 211.0, 82.0, 38.0, 60.0]
2025-05-13 13:38:41,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 51 minutes, 56 seconds)
2025-05-13 13:41:56,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:42:00,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 896.83087 ± 697.915
2025-05-13 13:42:00,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [821.73126, 697.2001, 492.6579, 204.19365, 47.942318, 1734.0052, 2220.378, 708.5252, 334.25476, 1707.4197]
2025-05-13 13:42:00,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 227.0, 180.0, 113.0, 58.0, 594.0, 729.0, 229.0, 131.0, 540.0]
2025-05-13 13:42:00,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 48 minutes, 53 seconds)
2025-05-13 13:45:14,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:45:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 650.81134 ± 488.377
2025-05-13 13:45:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1744.2504, 275.64886, 776.4779, 101.48122, 625.3349, 694.79706, 175.45639, 256.75958, 1239.3278, 618.5795]
2025-05-13 13:45:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [554.0, 117.0, 244.0, 65.0, 218.0, 228.0, 87.0, 116.0, 388.0, 205.0]
2025-05-13 13:45:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 45 minutes, 33 seconds)
2025-05-13 13:48:27,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:48:30,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 620.19360 ± 388.274
2025-05-13 13:48:30,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [648.56165, 93.294846, 342.44342, 872.07056, 616.52466, 584.7829, 60.24576, 1245.0465, 1229.3546, 509.61154]
2025-05-13 13:48:30,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 54.0, 138.0, 274.0, 201.0, 214.0, 43.0, 450.0, 403.0, 180.0]
2025-05-13 13:48:30,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 42 minutes, 22 seconds)
2025-05-13 13:51:47,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:51:50,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 749.33197 ± 292.527
2025-05-13 13:51:50,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [817.8534, 227.30434, 799.7187, 915.0786, 1087.4869, 944.8184, 941.17236, 627.966, 193.69926, 938.2217]
2025-05-13 13:51:50,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [255.0, 100.0, 278.0, 288.0, 332.0, 297.0, 297.0, 207.0, 91.0, 310.0]
2025-05-13 13:51:50,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 39 minutes, 14 seconds)
2025-05-13 13:54:55,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:54:57,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 405.52405 ± 342.103
2025-05-13 13:54:57,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [46.862217, 904.1813, 685.7279, 560.96423, 830.7964, 242.71484, 27.848228, 36.529114, 676.06256, 43.55406]
2025-05-13 13:54:57,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 283.0, 221.0, 184.0, 265.0, 108.0, 44.0, 46.0, 236.0, 42.0]
2025-05-13 13:54:57,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 35 minutes, 47 seconds)
2025-05-13 13:58:10,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:58:12,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 301.48993 ± 376.985
2025-05-13 13:58:12,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [848.85547, 570.35315, 68.36288, 34.32857, 32.251335, 1112.966, 71.605705, 62.140076, 165.32973, 48.706226]
2025-05-13 13:58:12,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [268.0, 241.0, 57.0, 38.0, 36.0, 348.0, 44.0, 90.0, 80.0, 36.0]
2025-05-13 13:58:12,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 24 seconds)
2025-05-13 14:01:26,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:01:29,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 752.46753 ± 349.556
2025-05-13 14:01:29,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [482.35803, 1266.2661, 510.48944, 1010.6366, 1019.6056, 1091.6461, 577.9336, 45.541977, 906.7509, 613.4467]
2025-05-13 14:01:29,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 415.0, 225.0, 311.0, 314.0, 381.0, 215.0, 45.0, 290.0, 201.0]
2025-05-13 14:01:29,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 10 seconds)
2025-05-13 14:04:51,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:04:53,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 466.10077 ± 356.070
2025-05-13 14:04:53,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [47.137924, 701.8039, 148.57191, 680.751, 638.5486, 498.87125, 102.0398, 52.72696, 589.2916, 1201.2651]
2025-05-13 14:04:53,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 245.0, 76.0, 227.0, 211.0, 168.0, 60.0, 46.0, 208.0, 376.0]
2025-05-13 14:04:53,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 12 seconds)
2025-05-13 14:07:54,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:07:57,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 753.36841 ± 442.019
2025-05-13 14:07:57,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [882.5436, 864.64777, 265.67825, 108.55922, 521.3114, 248.22272, 1458.8748, 822.302, 1397.8613, 963.68304]
2025-05-13 14:07:57,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [283.0, 295.0, 117.0, 73.0, 223.0, 120.0, 471.0, 261.0, 449.0, 327.0]
2025-05-13 14:07:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 34 seconds)
2025-05-13 14:11:11,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:11:13,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 587.25775 ± 515.580
2025-05-13 14:11:13,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [955.8115, 1663.0546, 242.946, 659.65173, 877.46814, 1050.8943, 199.91544, 63.59557, 44.707886, 114.53233]
2025-05-13 14:11:14,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 530.0, 110.0, 227.0, 275.0, 356.0, 128.0, 39.0, 45.0, 75.0]
2025-05-13 14:11:14,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 31 seconds)
2025-05-13 14:14:29,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:14:31,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 417.07959 ± 409.234
2025-05-13 14:14:31,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [434.26797, 437.8158, 1363.8191, 42.8819, 45.164997, 888.47034, 497.94247, 71.76822, 355.76135, 32.90373]
2025-05-13 14:14:31,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 213.0, 468.0, 40.0, 49.0, 283.0, 194.0, 79.0, 146.0, 38.0]
2025-05-13 14:14:31,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 18 seconds)
2025-05-13 14:17:42,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:17:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 579.53772 ± 234.492
2025-05-13 14:17:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [29.86784, 356.37894, 920.66406, 653.6297, 643.8994, 655.28156, 667.4674, 767.7278, 653.4945, 446.96558]
2025-05-13 14:17:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 142.0, 292.0, 215.0, 215.0, 221.0, 222.0, 294.0, 216.0, 167.0]
2025-05-13 14:17:45,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes)
2025-05-13 14:21:00,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:21:05,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 887.40741 ± 1033.736
2025-05-13 14:21:05,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2756.0986, 675.5873, 531.2068, 49.097054, 481.45386, 282.11288, 975.5437, 2992.038, 66.934105, 64.000854]
2025-05-13 14:21:05,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 228.0, 185.0, 29.0, 216.0, 153.0, 411.0, 1000.0, 41.0, 47.0]
2025-05-13 14:21:05,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 43 seconds)
2025-05-13 14:24:14,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:24:16,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 437.14346 ± 547.244
2025-05-13 14:24:16,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [25.899126, 374.60858, 40.310913, 1346.3103, 346.0734, 37.319756, 506.66104, 36.600536, 1602.5868, 55.063923]
2025-05-13 14:24:16,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 151.0, 49.0, 460.0, 159.0, 46.0, 186.0, 41.0, 511.0, 68.0]
2025-05-13 14:24:16,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 31 seconds)
2025-05-13 14:27:30,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:27:34,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 889.25977 ± 991.234
2025-05-13 14:27:34,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [732.8776, 27.593992, 1025.0092, 308.2216, 43.887108, 67.561646, 221.73428, 2852.3596, 2622.1504, 991.20197]
2025-05-13 14:27:34,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [287.0, 37.0, 383.0, 134.0, 46.0, 41.0, 107.0, 1000.0, 1000.0, 373.0]
2025-05-13 14:27:34,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 16 seconds)
2025-05-13 14:30:55,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:31:01,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1312.37183 ± 868.116
2025-05-13 14:31:01,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2067.071, 1327.8127, 1014.91736, 476.16357, 2775.3267, 1349.4125, 2598.6802, 506.89767, 965.18317, 42.25274]
2025-05-13 14:31:01,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [713.0, 485.0, 366.0, 166.0, 998.0, 506.0, 899.0, 199.0, 345.0, 42.0]
2025-05-13 14:31:01,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1312.37) for latency ExtremeSparseL4U32
2025-05-13 14:31:01,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1251 [DEBUG]: Training session finished
