2025-05-13 09:06:31,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mda-mem4
2025-05-13 09:06:31,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mda-mem4
2025-05-13 09:06:31,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1552dd0d5c10>}
2025-05-13 09:06:31,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:31,915 baseline-bpql-mda-noisy-hopper:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-13 09:06:31,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-13 09:06:31,932 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-13 09:06:31,932 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:31,937 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2025-05-13 09:06:32,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:32,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-13 09:09:35,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:09:35,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 72.26200 ± 9.851
2025-05-13 09:09:35,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [65.285965, 72.4652, 96.766914, 80.12804, 64.53339, 72.34932, 60.069515, 73.88104, 71.82047, 65.32012]
2025-05-13 09:09:35,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 39.0, 54.0, 43.0, 35.0, 39.0, 33.0, 40.0, 39.0, 36.0]
2025-05-13 09:09:35,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (72.26) for latency MM1Queue_a033_s075
2025-05-13 09:09:35,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 1 minute, 49 seconds)
2025-05-13 09:12:48,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:12:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 255.54805 ± 155.997
2025-05-13 09:12:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [371.34177, 456.27295, 282.0995, 366.07025, 13.694689, 293.0252, 83.15261, 196.47523, 450.96176, 42.386417]
2025-05-13 09:12:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [213.0, 270.0, 185.0, 186.0, 16.0, 184.0, 54.0, 150.0, 264.0, 33.0]
2025-05-13 09:12:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (255.55) for latency MM1Queue_a033_s075
2025-05-13 09:12:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 8 minutes, 24 seconds)
2025-05-13 09:16:04,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:16:05,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 188.38057 ± 59.565
2025-05-13 09:16:05,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [212.42906, 218.50803, 175.25276, 232.45692, 228.2349, 212.40363, 232.27573, 63.653618, 83.91574, 224.67525]
2025-05-13 09:16:05,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 106.0, 89.0, 111.0, 111.0, 101.0, 112.0, 37.0, 45.0, 105.0]
2025-05-13 09:16:05,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 8 minutes, 39 seconds)
2025-05-13 09:19:15,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:19:16,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 197.76089 ± 53.978
2025-05-13 09:19:16,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [233.60356, 141.67105, 311.7654, 143.64398, 214.49927, 206.28499, 240.83711, 129.2173, 201.45325, 154.63307]
2025-05-13 09:19:16,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 75.0, 161.0, 74.0, 101.0, 93.0, 104.0, 69.0, 93.0, 83.0]
2025-05-13 09:19:16,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 5 minutes, 37 seconds)
2025-05-13 09:22:27,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:22:29,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 205.09555 ± 104.438
2025-05-13 09:22:29,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [293.9136, 52.407658, 294.6427, 316.51895, 99.88216, 299.04486, 57.984055, 109.94642, 257.5167, 269.09854]
2025-05-13 09:22:29,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 33.0, 141.0, 159.0, 57.0, 146.0, 36.0, 63.0, 118.0, 129.0]
2025-05-13 09:22:29,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 2 minutes, 55 seconds)
2025-05-13 09:25:41,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:25:43,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 240.88196 ± 23.813
2025-05-13 09:25:43,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [244.5668, 239.71617, 248.10759, 248.40675, 171.60686, 248.05717, 256.29468, 241.19661, 251.23077, 259.63617]
2025-05-13 09:25:43,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 106.0, 109.0, 109.0, 84.0, 108.0, 113.0, 105.0, 107.0, 112.0]
2025-05-13 09:25:43,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 3 minutes, 8 seconds)
2025-05-13 09:28:56,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:28:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 263.78369 ± 80.705
2025-05-13 09:28:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [332.42517, 306.07516, 108.55401, 296.74045, 305.6671, 118.20784, 319.8169, 315.26016, 315.9525, 219.13762]
2025-05-13 09:28:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 127.0, 57.0, 123.0, 130.0, 64.0, 140.0, 133.0, 130.0, 102.0]
2025-05-13 09:28:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (263.78) for latency MM1Queue_a033_s075
2025-05-13 09:28:57,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 59 minutes, 57 seconds)
2025-05-13 09:32:08,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:32:10,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 280.35724 ± 88.617
2025-05-13 09:32:10,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [258.7125, 185.34325, 394.77673, 171.14616, 329.46872, 380.76773, 129.81361, 266.62885, 362.23828, 324.67648]
2025-05-13 09:32:10,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 93.0, 163.0, 87.0, 149.0, 160.0, 70.0, 131.0, 173.0, 135.0]
2025-05-13 09:32:10,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (280.36) for latency MM1Queue_a033_s075
2025-05-13 09:32:10,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 55 minutes, 51 seconds)
2025-05-13 09:35:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:35:24,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 249.63045 ± 72.213
2025-05-13 09:35:24,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [239.27185, 323.82376, 325.28128, 260.46182, 184.37502, 181.24237, 116.33066, 209.54086, 320.79233, 335.18463]
2025-05-13 09:35:24,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 128.0, 127.0, 113.0, 86.0, 85.0, 61.0, 95.0, 124.0, 134.0]
2025-05-13 09:35:24,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 53 minutes, 34 seconds)
2025-05-13 09:38:36,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:38:38,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 283.60495 ± 135.172
2025-05-13 09:38:38,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [289.60913, 165.18082, 492.79657, 308.46146, 336.402, 270.9176, 491.61737, 83.88553, 307.48093, 89.69796]
2025-05-13 09:38:38,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 76.0, 163.0, 119.0, 130.0, 109.0, 191.0, 48.0, 129.0, 50.0]
2025-05-13 09:38:38,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (283.60) for latency MM1Queue_a033_s075
2025-05-13 09:38:38,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 50 minutes, 40 seconds)
2025-05-13 09:41:52,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:41:53,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 194.68747 ± 79.268
2025-05-13 09:41:53,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [47.90613, 175.82013, 313.73367, 112.29671, 114.964935, 254.24733, 247.2329, 245.85188, 260.18463, 174.6363]
2025-05-13 09:41:53,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 111.0, 122.0, 61.0, 68.0, 126.0, 116.0, 127.0, 120.0, 89.0]
2025-05-13 09:41:53,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 47 minutes, 50 seconds)
2025-05-13 09:45:03,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:45:05,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 386.68304 ± 278.801
2025-05-13 09:45:05,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [843.79236, 591.64417, 115.68936, 105.47141, 86.019394, 609.2773, 179.3703, 563.0797, 103.59425, 668.8922]
2025-05-13 09:45:05,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 258.0, 80.0, 60.0, 51.0, 185.0, 101.0, 181.0, 70.0, 213.0]
2025-05-13 09:45:05,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (386.68) for latency MM1Queue_a033_s075
2025-05-13 09:45:05,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 43 minutes, 55 seconds)
2025-05-13 09:48:19,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:48:21,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 473.27725 ± 167.209
2025-05-13 09:48:21,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [477.60907, 323.601, 650.62427, 461.29895, 527.59326, 617.0197, 632.5516, 613.24963, 322.27524, 106.949234]
2025-05-13 09:48:21,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 117.0, 201.0, 153.0, 162.0, 189.0, 194.0, 188.0, 116.0, 58.0]
2025-05-13 09:48:21,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (473.28) for latency MM1Queue_a033_s075
2025-05-13 09:48:21,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 41 minutes, 45 seconds)
2025-05-13 09:51:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:51:36,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 447.13129 ± 210.914
2025-05-13 09:51:36,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [602.2964, 168.17761, 503.62436, 54.164513, 487.65497, 753.5662, 646.1212, 595.50507, 349.99118, 310.21127]
2025-05-13 09:51:36,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 84.0, 176.0, 33.0, 174.0, 255.0, 200.0, 208.0, 142.0, 135.0]
2025-05-13 09:51:36,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 38 minutes, 37 seconds)
2025-05-13 09:54:50,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:54:53,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 625.58917 ± 280.789
2025-05-13 09:54:53,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [110.144, 905.57684, 1019.71765, 461.81857, 206.4469, 761.7214, 553.7056, 658.19525, 725.41724, 853.1483]
2025-05-13 09:54:53,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 285.0, 319.0, 157.0, 97.0, 239.0, 191.0, 208.0, 230.0, 280.0]
2025-05-13 09:54:53,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (625.59) for latency MM1Queue_a033_s075
2025-05-13 09:54:53,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 36 minutes, 17 seconds)
2025-05-13 09:58:04,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:58:07,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 601.31073 ± 340.666
2025-05-13 09:58:07,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [651.5715, 723.5685, 665.45856, 179.15681, 552.4423, 194.89334, 649.60034, 1196.1086, 132.6866, 1067.6208]
2025-05-13 09:58:07,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 257.0, 213.0, 89.0, 199.0, 103.0, 206.0, 393.0, 68.0, 381.0]
2025-05-13 09:58:07,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 32 minutes, 39 seconds)
2025-05-13 10:01:22,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:01:24,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 625.95435 ± 366.132
2025-05-13 10:01:24,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [22.738651, 674.1611, 1111.5176, 1298.041, 711.9448, 639.4743, 662.7718, 623.06305, 317.14523, 198.68646]
2025-05-13 10:01:24,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 218.0, 345.0, 436.0, 221.0, 199.0, 207.0, 213.0, 124.0, 86.0]
2025-05-13 10:01:24,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (625.95) for latency MM1Queue_a033_s075
2025-05-13 10:01:24,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 30 minutes, 49 seconds)
2025-05-13 10:04:37,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:04:43,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1192.78857 ± 826.752
2025-05-13 10:04:43,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2105.054, 916.9513, 2885.497, 575.91437, 1885.4905, 294.17902, 49.494274, 1070.9271, 926.1013, 1218.2765]
2025-05-13 10:04:43,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [683.0, 356.0, 1000.0, 211.0, 622.0, 132.0, 30.0, 426.0, 312.0, 441.0]
2025-05-13 10:04:43,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1192.79) for latency MM1Queue_a033_s075
2025-05-13 10:04:43,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 28 minutes, 17 seconds)
2025-05-13 10:07:53,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:07:57,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 827.93488 ± 776.311
2025-05-13 10:07:57,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [722.3781, 688.3548, 3069.979, 325.08746, 935.9957, 492.77164, 765.25586, 203.93398, 661.6177, 413.97495]
2025-05-13 10:07:57,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 220.0, 1000.0, 133.0, 284.0, 184.0, 245.0, 97.0, 212.0, 158.0]
2025-05-13 10:07:57,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 24 minutes, 46 seconds)
2025-05-13 10:11:09,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:11:12,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 543.39923 ± 337.761
2025-05-13 10:11:12,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [153.31848, 897.3992, 861.39404, 426.26523, 1119.4402, 238.63557, 257.71225, 379.48447, 222.1274, 878.21545]
2025-05-13 10:11:12,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 338.0, 315.0, 181.0, 422.0, 111.0, 115.0, 155.0, 108.0, 327.0]
2025-05-13 10:11:12,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 21 minutes, 1 second)
2025-05-13 10:14:25,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:14:28,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 652.96112 ± 392.174
2025-05-13 10:14:28,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [786.4034, 1214.4635, 689.6611, 107.68574, 1344.5828, 511.2694, 259.86252, 550.75726, 859.4268, 205.49892]
2025-05-13 10:14:28,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [255.0, 432.0, 236.0, 62.0, 463.0, 204.0, 117.0, 218.0, 316.0, 95.0]
2025-05-13 10:14:28,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 18 minutes, 34 seconds)
2025-05-13 10:17:37,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:17:39,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 336.29636 ± 452.375
2025-05-13 10:17:39,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [139.2838, 111.155785, 95.83293, 342.36334, 203.21988, 1659.6163, 114.4536, 110.4308, 407.8896, 178.71748]
2025-05-13 10:17:39,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 59.0, 60.0, 149.0, 108.0, 591.0, 68.0, 67.0, 163.0, 86.0]
2025-05-13 10:17:39,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 13 minutes, 22 seconds)
2025-05-13 10:20:51,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:20:54,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 476.90732 ± 280.573
2025-05-13 10:20:54,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [607.18036, 148.41052, 473.91748, 164.68471, 613.5932, 1118.0005, 125.04178, 538.0934, 550.31854, 429.83267]
2025-05-13 10:20:54,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 74.0, 169.0, 80.0, 209.0, 333.0, 66.0, 212.0, 182.0, 163.0]
2025-05-13 10:20:54,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 9 minutes, 10 seconds)
2025-05-13 10:24:05,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:24:09,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 857.76678 ± 767.899
2025-05-13 10:24:09,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [359.6034, 647.4825, 678.1973, 491.67337, 955.55524, 444.29083, 2984.662, 440.38403, 263.57697, 1312.2417]
2025-05-13 10:24:09,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 212.0, 257.0, 199.0, 292.0, 178.0, 1000.0, 167.0, 125.0, 451.0]
2025-05-13 10:24:09,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 6 minutes, 17 seconds)
2025-05-13 10:27:25,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:27:28,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 716.94958 ± 599.371
2025-05-13 10:27:28,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [10.009826, 678.5407, 698.5607, 375.66107, 664.9424, 408.63544, 2284.0479, 222.12411, 1135.6779, 691.2957]
2025-05-13 10:27:28,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 263.0, 242.0, 162.0, 248.0, 186.0, 834.0, 105.0, 416.0, 272.0]
2025-05-13 10:27:28,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 4 minutes, 9 seconds)
2025-05-13 10:30:30,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:30:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 537.04358 ± 408.586
2025-05-13 10:30:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1672.0844, 364.39474, 505.8261, 650.2793, 253.66042, 143.23192, 272.23114, 503.07803, 622.1676, 383.4826]
2025-05-13 10:30:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [533.0, 140.0, 176.0, 210.0, 110.0, 72.0, 115.0, 195.0, 199.0, 154.0]
2025-05-13 10:30:33,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 57 minutes, 53 seconds)
2025-05-13 10:33:51,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:33:55,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 877.15869 ± 971.773
2025-05-13 10:33:55,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [195.65158, 155.23497, 685.37756, 225.29292, 1529.0886, 62.457546, 2704.7573, 2586.6646, 244.77214, 382.2902]
2025-05-13 10:33:55,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 77.0, 264.0, 101.0, 452.0, 42.0, 837.0, 866.0, 108.0, 158.0]
2025-05-13 10:33:55,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 57 minutes, 28 seconds)
2025-05-13 10:36:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:37:01,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 498.25455 ± 338.736
2025-05-13 10:37:01,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [189.02621, 231.80467, 430.21466, 81.32996, 728.8582, 1140.5101, 781.508, 398.23242, 851.5522, 149.50874]
2025-05-13 10:37:01,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 107.0, 159.0, 51.0, 234.0, 396.0, 238.0, 152.0, 268.0, 76.0]
2025-05-13 10:37:01,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 52 minutes, 13 seconds)
2025-05-13 10:40:15,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:40:19,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 798.02600 ± 767.249
2025-05-13 10:40:19,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [595.26807, 964.744, 1264.1841, 29.929092, 270.25262, 780.5567, 109.4996, 796.94916, 2813.4043, 355.47256]
2025-05-13 10:40:19,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 288.0, 460.0, 25.0, 120.0, 309.0, 58.0, 300.0, 1000.0, 148.0]
2025-05-13 10:40:19,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 49 minutes, 38 seconds)
2025-05-13 10:43:30,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:43:34,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 833.99286 ± 420.885
2025-05-13 10:43:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [412.99976, 985.2471, 458.69266, 1764.7467, 1127.3579, 782.80524, 867.5165, 683.3497, 200.05501, 1057.1573]
2025-05-13 10:43:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 338.0, 177.0, 620.0, 369.0, 278.0, 297.0, 228.0, 106.0, 374.0]
2025-05-13 10:43:34,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 45 minutes, 22 seconds)
2025-05-13 10:46:43,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:46:46,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 684.48962 ± 508.698
2025-05-13 10:46:46,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1888.9044, 784.5691, 603.9989, 748.35175, 337.7632, 98.81056, 97.13808, 982.3324, 321.78134, 981.2469]
2025-05-13 10:46:46,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [559.0, 266.0, 231.0, 237.0, 137.0, 57.0, 54.0, 297.0, 131.0, 336.0]
2025-05-13 10:46:46,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 43 minutes, 43 seconds)
2025-05-13 10:49:55,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:49:57,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 400.07480 ± 191.068
2025-05-13 10:49:57,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [478.23834, 333.24368, 437.45834, 282.66843, 189.4451, 353.6857, 533.3815, 697.289, 648.66425, 46.673588]
2025-05-13 10:49:57,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 146.0, 157.0, 129.0, 101.0, 154.0, 185.0, 222.0, 202.0, 34.0]
2025-05-13 10:49:57,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 38 minutes, 1 second)
2025-05-13 10:53:12,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:53:14,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 529.12598 ± 437.975
2025-05-13 10:53:14,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [114.16222, 1597.3555, 232.92426, 465.554, 284.3447, 828.9954, 370.27545, 903.0438, 106.95504, 387.6493]
2025-05-13 10:53:14,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 540.0, 108.0, 181.0, 116.0, 290.0, 171.0, 333.0, 57.0, 148.0]
2025-05-13 10:53:14,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 37 minutes, 17 seconds)
2025-05-13 10:56:24,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:56:26,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 625.00317 ± 362.785
2025-05-13 10:56:26,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1067.5165, 344.70712, 958.01, 1067.5247, 254.74443, 186.90164, 978.60944, 570.45416, 105.65214, 715.9116]
2025-05-13 10:56:26,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [334.0, 143.0, 292.0, 356.0, 111.0, 92.0, 288.0, 211.0, 56.0, 251.0]
2025-05-13 10:56:26,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 32 minutes, 45 seconds)
2025-05-13 10:59:34,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:59:36,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 448.79584 ± 273.646
2025-05-13 10:59:36,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [319.4232, 302.9388, 434.6028, 733.34515, 732.9848, 1036.28, 186.26227, 153.95735, 291.31564, 296.8487]
2025-05-13 10:59:36,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 142.0, 179.0, 231.0, 243.0, 325.0, 100.0, 75.0, 137.0, 138.0]
2025-05-13 10:59:36,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 28 minutes, 28 seconds)
2025-05-13 11:02:48,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:02:50,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 386.99548 ± 133.851
2025-05-13 11:02:50,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [238.57071, 447.2282, 307.30893, 212.3507, 353.33243, 262.06003, 469.75327, 545.3061, 647.76, 386.2846]
2025-05-13 11:02:50,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 169.0, 142.0, 109.0, 151.0, 112.0, 174.0, 194.0, 210.0, 156.0]
2025-05-13 11:02:50,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 25 minutes, 46 seconds)
2025-05-13 11:06:00,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:06:02,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 504.92490 ± 306.724
2025-05-13 11:06:02,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [900.1787, 108.495575, 118.24522, 341.10117, 881.475, 111.05208, 704.06683, 650.89844, 801.3927, 432.34323]
2025-05-13 11:06:02,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 58.0, 60.0, 134.0, 256.0, 61.0, 223.0, 208.0, 254.0, 158.0]
2025-05-13 11:06:02,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 22 minutes, 45 seconds)
2025-05-13 11:09:14,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:09:17,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 555.70905 ± 290.072
2025-05-13 11:09:17,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [730.8506, 490.27502, 471.56088, 1014.78925, 246.5768, 105.52781, 184.31924, 843.7211, 682.67755, 786.79205]
2025-05-13 11:09:17,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [220.0, 190.0, 173.0, 298.0, 109.0, 57.0, 85.0, 251.0, 235.0, 230.0]
2025-05-13 11:09:17,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 18 minutes, 55 seconds)
2025-05-13 11:12:26,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:12:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 901.47742 ± 895.910
2025-05-13 11:12:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1749.713, 3201.4126, 668.2312, 117.32838, 1186.2113, 498.74414, 263.83667, 297.41583, 650.1619, 381.72003]
2025-05-13 11:12:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [559.0, 1000.0, 206.0, 63.0, 398.0, 201.0, 113.0, 123.0, 201.0, 145.0]
2025-05-13 11:12:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 15 minutes, 51 seconds)
2025-05-13 11:15:40,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:15:42,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 452.53085 ± 337.484
2025-05-13 11:15:42,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1064.1898, 27.695824, 238.67879, 567.727, 914.4108, 21.07711, 262.19727, 491.2846, 259.76154, 678.2858]
2025-05-13 11:15:42,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [304.0, 24.0, 104.0, 198.0, 265.0, 19.0, 112.0, 182.0, 109.0, 226.0]
2025-05-13 11:15:42,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 13 minutes, 10 seconds)
2025-05-13 11:18:54,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:18:57,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 782.52246 ± 598.214
2025-05-13 11:18:57,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [132.5288, 98.71374, 1292.1941, 1681.952, 1211.6727, 574.8609, 1717.3987, 424.637, 279.8432, 411.42355]
2025-05-13 11:18:57,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 56.0, 426.0, 531.0, 403.0, 211.0, 525.0, 156.0, 117.0, 159.0]
2025-05-13 11:18:57,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 10 minutes, 10 seconds)
2025-05-13 11:22:08,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:22:11,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 737.40112 ± 612.080
2025-05-13 11:22:11,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [183.60513, 1037.7544, 1547.6105, 450.16638, 2008.9631, 202.14507, 663.5198, 947.5172, 114.24152, 218.48833]
2025-05-13 11:22:11,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 332.0, 476.0, 175.0, 661.0, 95.0, 234.0, 272.0, 60.0, 98.0]
2025-05-13 11:22:11,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 7 minutes, 22 seconds)
2025-05-13 11:25:21,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:25:27,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1255.51746 ± 1019.457
2025-05-13 11:25:27,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [74.13458, 492.23764, 344.1169, 3091.0652, 1419.1742, 949.0884, 3054.0706, 1678.3506, 972.16534, 480.77106]
2025-05-13 11:25:27,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 183.0, 134.0, 1000.0, 443.0, 276.0, 1000.0, 529.0, 342.0, 170.0]
2025-05-13 11:25:27,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1255.52) for latency MM1Queue_a033_s075
2025-05-13 11:25:27,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 4 minutes, 22 seconds)
2025-05-13 11:28:39,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:28:43,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 786.41052 ± 944.565
2025-05-13 11:28:43,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [102.676025, 3124.3557, 110.83477, 106.08684, 1869.7291, 152.28268, 858.5265, 932.8908, 315.11203, 291.61066]
2025-05-13 11:28:43,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 1000.0, 60.0, 56.0, 591.0, 76.0, 283.0, 307.0, 138.0, 127.0]
2025-05-13 11:28:43,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 1 minute, 40 seconds)
2025-05-13 11:31:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:32:03,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 852.44580 ± 591.021
2025-05-13 11:32:03,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2248.239, 957.5582, 915.70056, 643.1399, 1424.3335, 793.72205, 199.64374, 99.55027, 454.57056, 787.99963]
2025-05-13 11:32:03,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [728.0, 311.0, 345.0, 229.0, 478.0, 244.0, 94.0, 55.0, 176.0, 266.0]
2025-05-13 11:32:03,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 59 minutes, 43 seconds)
2025-05-13 11:35:16,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:35:19,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 708.32068 ± 436.693
2025-05-13 11:35:19,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [763.4671, 573.3679, 1506.567, 703.36945, 590.1148, 146.50871, 114.79139, 1193.7128, 340.52377, 1150.7832]
2025-05-13 11:35:19,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [229.0, 202.0, 457.0, 240.0, 210.0, 74.0, 61.0, 347.0, 139.0, 372.0]
2025-05-13 11:35:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 56 minutes, 43 seconds)
2025-05-13 11:38:22,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:38:26,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1014.55627 ± 667.725
2025-05-13 11:38:26,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1246.6407, 1062.9862, 233.6492, 2482.3286, 764.30164, 934.88574, 1549.0449, 206.77216, 1363.8462, 301.1078]
2025-05-13 11:38:26,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [367.0, 334.0, 103.0, 776.0, 261.0, 275.0, 454.0, 90.0, 433.0, 126.0]
2025-05-13 11:38:26,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 52 minutes, 12 seconds)
2025-05-13 11:41:47,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:41:51,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 987.31042 ± 797.951
2025-05-13 11:41:51,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [169.44481, 437.51462, 1234.2048, 2001.2845, 1171.3568, 1025.8368, 206.76373, 707.8145, 206.75858, 2712.1252]
2025-05-13 11:41:51,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 166.0, 405.0, 624.0, 355.0, 292.0, 93.0, 250.0, 92.0, 907.0]
2025-05-13 11:41:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 50 minutes, 32 seconds)
2025-05-13 11:44:54,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:45:00,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1389.78259 ± 1127.800
2025-05-13 11:45:00,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [516.1641, 363.92218, 908.5422, 3257.316, 2921.1084, 777.0727, 768.3119, 492.9769, 794.1677, 3098.244]
2025-05-13 11:45:00,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [187.0, 141.0, 300.0, 1000.0, 895.0, 265.0, 265.0, 179.0, 271.0, 1000.0]
2025-05-13 11:45:00,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1389.78) for latency MM1Queue_a033_s075
2025-05-13 11:45:00,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 46 minutes, 5 seconds)
2025-05-13 11:48:18,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:48:22,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 791.33777 ± 455.758
2025-05-13 11:48:22,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [138.96605, 498.3135, 155.42136, 1220.4386, 985.0989, 979.24347, 375.76715, 1399.7814, 768.2648, 1392.0829]
2025-05-13 11:48:22,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 183.0, 79.0, 397.0, 293.0, 288.0, 152.0, 479.0, 242.0, 482.0]
2025-05-13 11:48:22,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 43 minutes, 12 seconds)
2025-05-13 11:51:25,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:51:28,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 669.21521 ± 466.662
2025-05-13 11:51:28,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1045.3907, 1301.8438, 414.9189, 660.11334, 301.9399, 1571.7456, 111.79916, 544.8619, 128.6292, 610.9092]
2025-05-13 11:51:28,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [315.0, 429.0, 155.0, 202.0, 124.0, 489.0, 60.0, 197.0, 64.0, 224.0]
2025-05-13 11:51:28,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 38 minutes, 19 seconds)
2025-05-13 11:54:38,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:54:42,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 905.28528 ± 720.564
2025-05-13 11:54:42,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [222.2275, 463.05957, 584.0686, 2764.8203, 1411.2559, 801.6124, 1101.5591, 1016.936, 444.51752, 242.79544]
2025-05-13 11:54:42,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 176.0, 208.0, 879.0, 477.0, 277.0, 350.0, 346.0, 173.0, 101.0]
2025-05-13 11:54:42,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 36 minutes, 7 seconds)
2025-05-13 11:57:55,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:58:00,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1116.77893 ± 991.766
2025-05-13 11:58:00,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1018.8139, 2945.809, 137.15503, 606.4344, 1015.8884, 102.06314, 2110.7837, 2532.3635, 189.641, 508.83667]
2025-05-13 11:58:00,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [334.0, 918.0, 73.0, 213.0, 328.0, 61.0, 688.0, 834.0, 92.0, 188.0]
2025-05-13 11:58:01,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 31 minutes, 54 seconds)
2025-05-13 12:01:11,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:01:15,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 801.41412 ± 621.587
2025-05-13 12:01:15,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [573.6226, 398.18808, 868.85596, 157.21423, 144.18365, 2365.1758, 1253.0021, 974.87146, 821.0844, 457.94308]
2025-05-13 12:01:15,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 154.0, 291.0, 75.0, 71.0, 737.0, 402.0, 331.0, 297.0, 178.0]
2025-05-13 12:01:15,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 29 minutes, 27 seconds)
2025-05-13 12:04:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:04:25,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 785.48840 ± 345.621
2025-05-13 12:04:25,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [824.13727, 1078.3076, 945.69415, 876.30054, 177.70285, 654.5989, 1198.1819, 684.5297, 210.59981, 1204.8313]
2025-05-13 12:04:25,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [271.0, 348.0, 311.0, 289.0, 83.0, 241.0, 380.0, 234.0, 101.0, 374.0]
2025-05-13 12:04:25,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 24 minutes, 31 seconds)
2025-05-13 12:07:37,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:07:44,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1529.67664 ± 1000.018
2025-05-13 12:07:44,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [202.42377, 385.00076, 2429.2803, 1009.3456, 2815.051, 860.1229, 973.6188, 1159.6246, 2275.7644, 3186.5344]
2025-05-13 12:07:44,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 152.0, 773.0, 329.0, 869.0, 292.0, 322.0, 405.0, 705.0, 1000.0]
2025-05-13 12:07:44,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1529.68) for latency MM1Queue_a033_s075
2025-05-13 12:07:44,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 23 minutes, 6 seconds)
2025-05-13 12:10:54,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:10:57,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 604.44397 ± 352.353
2025-05-13 12:10:57,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1181.5193, 481.51453, 1071.581, 107.268265, 567.0524, 590.2552, 40.836613, 435.5327, 872.23724, 696.642]
2025-05-13 12:10:57,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [394.0, 175.0, 304.0, 57.0, 205.0, 206.0, 32.0, 172.0, 297.0, 233.0]
2025-05-13 12:10:57,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 19 minutes, 44 seconds)
2025-05-13 12:14:11,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:14:18,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1558.37634 ± 1166.520
2025-05-13 12:14:18,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2005.9403, 647.02795, 1040.0153, 216.35281, 3185.0305, 931.20825, 909.0953, 3189.917, 3200.2385, 258.93726]
2025-05-13 12:14:18,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [655.0, 238.0, 360.0, 96.0, 1000.0, 326.0, 314.0, 1000.0, 1000.0, 110.0]
2025-05-13 12:14:18,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1558.38) for latency MM1Queue_a033_s075
2025-05-13 12:14:18,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 16 minutes, 50 seconds)
2025-05-13 12:17:31,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:17:37,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1344.51294 ± 1076.623
2025-05-13 12:17:37,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [484.66312, 572.80994, 3011.8057, 274.85318, 2238.679, 839.94293, 782.1438, 3054.075, 2071.047, 115.110794]
2025-05-13 12:17:37,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 202.0, 1000.0, 117.0, 771.0, 308.0, 270.0, 1000.0, 690.0, 62.0]
2025-05-13 12:17:37,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 14 minutes, 18 seconds)
2025-05-13 12:20:45,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:20:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 848.78748 ± 895.365
2025-05-13 12:20:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [18.74431, 2343.0786, 114.14276, 543.95294, 668.96857, 314.1096, 670.6256, 2754.227, 903.0967, 156.92851]
2025-05-13 12:20:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 756.0, 62.0, 203.0, 240.0, 130.0, 232.0, 872.0, 301.0, 74.0]
2025-05-13 12:20:49,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 11 minutes, 8 seconds)
2025-05-13 12:23:59,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:24:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1426.91858 ± 1069.908
2025-05-13 12:24:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [39.22867, 1223.3918, 2296.14, 3191.9604, 950.2999, 184.94533, 1290.8069, 392.1119, 3073.0881, 1627.2133]
2025-05-13 12:24:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 409.0, 734.0, 1000.0, 328.0, 91.0, 435.0, 151.0, 1000.0, 466.0]
2025-05-13 12:24:05,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 7 minutes, 31 seconds)
2025-05-13 12:27:18,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:27:23,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 969.06067 ± 1128.653
2025-05-13 12:27:23,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [134.68933, 204.45004, 674.4457, 122.15408, 273.6865, 645.1231, 819.0197, 3128.5222, 457.26352, 3231.252]
2025-05-13 12:27:23,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 105.0, 239.0, 63.0, 118.0, 198.0, 277.0, 1000.0, 172.0, 1000.0]
2025-05-13 12:27:23,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 4 minutes, 52 seconds)
2025-05-13 12:30:30,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:30:33,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 571.47711 ± 642.146
2025-05-13 12:30:33,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [114.3756, 7.1861167, 8.207696, 2110.7036, 1265.9799, 21.351927, 736.1065, 620.011, 262.8991, 567.9496]
2025-05-13 12:30:33,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 10.0, 11.0, 660.0, 410.0, 20.0, 256.0, 225.0, 112.0, 207.0]
2025-05-13 12:30:33,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 12 seconds)
2025-05-13 12:33:46,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:33:49,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 567.23865 ± 513.578
2025-05-13 12:33:49,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [155.07684, 736.0363, 120.81281, 156.60823, 91.4609, 293.35056, 862.61334, 452.53366, 1030.4084, 1773.4852]
2025-05-13 12:33:49,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 252.0, 65.0, 77.0, 52.0, 123.0, 268.0, 171.0, 295.0, 558.0]
2025-05-13 12:33:49,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 56 minutes, 34 seconds)
2025-05-13 12:37:07,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:37:12,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1319.97925 ± 1079.386
2025-05-13 12:37:12,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [322.44843, 138.18758, 3171.6611, 202.18019, 1827.2369, 1596.7433, 890.9515, 2587.7056, 2335.8794, 126.79899]
2025-05-13 12:37:12,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 72.0, 1000.0, 95.0, 599.0, 523.0, 297.0, 808.0, 749.0, 66.0]
2025-05-13 12:37:12,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 54 minutes, 45 seconds)
2025-05-13 12:40:18,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:40:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 840.28485 ± 432.512
2025-05-13 12:40:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [488.7291, 875.8441, 852.62665, 246.1483, 668.8698, 1839.4303, 451.4584, 881.0829, 799.59644, 1299.0624]
2025-05-13 12:40:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 305.0, 286.0, 112.0, 237.0, 586.0, 180.0, 286.0, 244.0, 378.0]
2025-05-13 12:40:22,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 50 minutes, 42 seconds)
2025-05-13 12:43:33,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:43:37,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 889.61877 ± 840.122
2025-05-13 12:43:37,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [420.4492, 1128.0853, 414.20117, 18.93135, 1014.6475, 517.5731, 676.34576, 1272.0281, 3145.6292, 288.2972]
2025-05-13 12:43:37,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 350.0, 154.0, 19.0, 304.0, 182.0, 242.0, 429.0, 1000.0, 126.0]
2025-05-13 12:43:37,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 47 minutes, 13 seconds)
2025-05-13 12:46:51,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:46:53,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 467.39453 ± 272.355
2025-05-13 12:46:53,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [783.6594, 679.87317, 226.66344, 870.7442, 93.39367, 337.28955, 699.71277, 283.45645, 577.1613, 121.99191]
2025-05-13 12:46:53,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [271.0, 228.0, 105.0, 287.0, 54.0, 136.0, 246.0, 119.0, 210.0, 63.0]
2025-05-13 12:46:53,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 44 minutes, 33 seconds)
2025-05-13 12:50:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:50:05,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1121.01477 ± 640.467
2025-05-13 12:50:05,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [921.8197, 1378.6356, 214.57202, 848.00336, 2010.6091, 842.98944, 1103.1874, 2352.2188, 1252.3517, 285.76056]
2025-05-13 12:50:05,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 465.0, 101.0, 251.0, 635.0, 290.0, 369.0, 725.0, 412.0, 122.0]
2025-05-13 12:50:05,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 40 minutes, 53 seconds)
2025-05-13 12:53:16,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:53:20,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 907.82288 ± 847.574
2025-05-13 12:53:20,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1689.5425, 232.89215, 1014.3581, 266.65445, 469.0338, 933.1714, 3111.5852, 584.7401, 403.15045, 373.1009]
2025-05-13 12:53:20,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [548.0, 102.0, 359.0, 120.0, 178.0, 313.0, 1000.0, 227.0, 156.0, 146.0]
2025-05-13 12:53:20,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 36 minutes, 48 seconds)
2025-05-13 12:56:30,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:56:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1111.34485 ± 876.739
2025-05-13 12:56:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [287.1055, 1706.0294, 1703.6189, 590.0617, 1479.2195, 606.1391, 341.20468, 499.3114, 672.69666, 3228.061]
2025-05-13 12:56:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 540.0, 567.0, 222.0, 472.0, 213.0, 140.0, 181.0, 235.0, 1000.0]
2025-05-13 12:56:35,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 34 minutes, 4 seconds)
2025-05-13 12:59:58,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:00:04,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1333.76379 ± 1206.077
2025-05-13 13:00:04,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [732.1452, 214.14832, 924.8194, 90.60264, 203.54991, 3120.1033, 3175.905, 3041.0386, 699.5998, 1135.7255]
2025-05-13 13:00:04,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [244.0, 95.0, 309.0, 51.0, 94.0, 1000.0, 1000.0, 964.0, 239.0, 334.0]
2025-05-13 13:00:04,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 32 minutes, 3 seconds)
2025-05-13 13:03:04,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:03:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1577.69556 ± 1305.413
2025-05-13 13:03:11,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2902.098, 1164.3223, 259.9564, 19.57589, 3158.0674, 989.13336, 253.52946, 593.87946, 3263.818, 3172.5747]
2025-05-13 13:03:11,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [911.0, 347.0, 116.0, 16.0, 1000.0, 287.0, 108.0, 208.0, 1000.0, 1000.0]
2025-05-13 13:03:11,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1577.70) for latency MM1Queue_a033_s075
2025-05-13 13:03:11,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 28 minutes, 2 seconds)
2025-05-13 13:06:27,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:06:30,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 691.80103 ± 539.198
2025-05-13 13:06:30,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1865.0875, 271.0304, 15.968091, 288.2847, 1028.2626, 1177.2991, 350.60135, 919.70844, 232.60417, 769.16394]
2025-05-13 13:06:30,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [584.0, 116.0, 17.0, 120.0, 326.0, 385.0, 137.0, 277.0, 106.0, 246.0]
2025-05-13 13:06:30,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 25 minutes, 21 seconds)
2025-05-13 13:09:38,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:09:43,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1124.55103 ± 893.441
2025-05-13 13:09:43,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [181.94023, 648.21063, 463.99268, 1543.3153, 333.76068, 2395.9836, 608.75757, 983.36194, 3053.0293, 1033.1594]
2025-05-13 13:09:43,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 231.0, 172.0, 504.0, 134.0, 785.0, 208.0, 283.0, 1000.0, 351.0]
2025-05-13 13:09:43,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-05-13 13:12:50,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:12:57,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1665.65503 ± 1210.688
2025-05-13 13:12:57,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [929.791, 924.2309, 3079.9905, 3052.302, 295.15866, 442.78467, 3188.5718, 1353.8356, 307.24466, 3082.6416]
2025-05-13 13:12:57,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [268.0, 271.0, 1000.0, 911.0, 121.0, 170.0, 1000.0, 425.0, 125.0, 1000.0]
2025-05-13 13:12:57,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (1665.66) for latency MM1Queue_a033_s075
2025-05-13 13:12:57,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 18 minutes, 35 seconds)
2025-05-13 13:16:12,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:16:16,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 855.28259 ± 754.024
2025-05-13 13:16:16,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [688.211, 1163.614, 3014.745, 667.7805, 568.7513, 708.68945, 519.9772, 249.7372, 421.7893, 549.53125]
2025-05-13 13:16:16,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 391.0, 951.0, 214.0, 202.0, 247.0, 173.0, 110.0, 160.0, 196.0]
2025-05-13 13:16:16,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 14 minutes, 31 seconds)
2025-05-13 13:19:30,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:19:36,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1368.13831 ± 777.130
2025-05-13 13:19:36,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2106.4714, 634.0305, 1493.7335, 3093.6643, 1618.0651, 968.6542, 1492.8541, 831.9748, 173.56036, 1268.3751]
2025-05-13 13:19:36,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [728.0, 228.0, 521.0, 1000.0, 554.0, 325.0, 483.0, 279.0, 82.0, 359.0]
2025-05-13 13:19:36,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 12 minutes, 15 seconds)
2025-05-13 13:22:41,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:22:46,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1154.80859 ± 973.805
2025-05-13 13:22:46,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [264.8687, 2463.5652, 3100.9429, 731.65796, 554.53766, 962.7774, 353.3058, 1978.2944, 1120.299, 17.838278]
2025-05-13 13:22:46,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 818.0, 1000.0, 256.0, 196.0, 340.0, 147.0, 662.0, 396.0, 18.0]
2025-05-13 13:22:46,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 8 minutes, 19 seconds)
2025-05-13 13:26:01,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:26:08,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1498.85742 ± 859.898
2025-05-13 13:26:08,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1941.0033, 1281.9347, 219.65521, 2433.6394, 145.15309, 1646.3185, 2516.656, 1921.0343, 2326.6191, 556.5607]
2025-05-13 13:26:08,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [640.0, 434.0, 98.0, 790.0, 72.0, 556.0, 841.0, 642.0, 792.0, 202.0]
2025-05-13 13:26:08,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 5 minutes, 41 seconds)
2025-05-13 13:29:16,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:29:22,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1326.21094 ± 1059.609
2025-05-13 13:29:22,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2637.0803, 2545.6406, 741.0719, 315.5113, 985.4095, 2905.4883, 127.1493, 2238.17, 399.88293, 366.7065]
2025-05-13 13:29:22,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [880.0, 754.0, 260.0, 129.0, 293.0, 969.0, 64.0, 707.0, 156.0, 147.0]
2025-05-13 13:29:22,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 2 minutes, 20 seconds)
2025-05-13 13:32:32,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:32:37,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1093.36157 ± 1037.478
2025-05-13 13:32:37,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1121.3234, 640.7042, 1903.1823, 955.95123, 21.036972, 3099.5613, 151.00888, 158.88885, 2604.0732, 277.88495]
2025-05-13 13:32:37,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [330.0, 224.0, 635.0, 282.0, 21.0, 1000.0, 74.0, 76.0, 854.0, 116.0]
2025-05-13 13:32:37,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 58 minutes, 52 seconds)
2025-05-13 13:35:58,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:36:03,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1217.48767 ± 726.230
2025-05-13 13:36:03,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1697.2577, 1276.9854, 3089.7458, 502.78326, 778.5765, 696.6979, 772.4648, 1236.1884, 1458.1165, 666.06104]
2025-05-13 13:36:03,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [534.0, 415.0, 1000.0, 183.0, 246.0, 240.0, 260.0, 421.0, 492.0, 235.0]
2025-05-13 13:36:03,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 55 minutes, 56 seconds)
2025-05-13 13:39:06,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:39:11,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1120.88025 ± 974.004
2025-05-13 13:39:11,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [338.41788, 216.64641, 11.717464, 1986.196, 2976.6099, 2403.7214, 698.3894, 262.32205, 969.9914, 1344.7911]
2025-05-13 13:39:11,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [136.0, 105.0, 15.0, 683.0, 1000.0, 814.0, 247.0, 111.0, 280.0, 471.0]
2025-05-13 13:39:11,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 52 minutes, 32 seconds)
2025-05-13 13:42:25,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:42:29,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 995.32666 ± 816.259
2025-05-13 13:42:29,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [3109.323, 440.17276, 800.33496, 985.854, 1472.0696, 310.79868, 582.07275, 1475.299, 303.76465, 473.57852]
2025-05-13 13:42:29,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 163.0, 239.0, 345.0, 496.0, 129.0, 205.0, 419.0, 134.0, 172.0]
2025-05-13 13:42:29,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 49 minutes, 3 seconds)
2025-05-13 13:45:43,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:45:48,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1080.01294 ± 918.997
2025-05-13 13:45:48,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [13.612503, 372.2182, 474.4134, 1606.3612, 1157.5228, 271.32257, 399.49835, 1952.2196, 1434.8486, 3118.1113]
2025-05-13 13:45:48,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 144.0, 176.0, 549.0, 347.0, 117.0, 154.0, 651.0, 473.0, 1000.0]
2025-05-13 13:45:48,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 46 minutes, 1 second)
2025-05-13 13:48:54,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:49:00,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1469.53784 ± 1355.935
2025-05-13 13:49:00,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [158.26816, 186.66571, 3051.5247, 351.24368, 3129.148, 3208.3794, 352.7951, 3097.5474, 646.35187, 513.4543]
2025-05-13 13:49:00,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 90.0, 947.0, 139.0, 1000.0, 1000.0, 141.0, 1000.0, 226.0, 190.0]
2025-05-13 13:49:00,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 42 minutes, 36 seconds)
2025-05-13 13:52:14,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:52:18,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1003.38806 ± 649.409
2025-05-13 13:52:18,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1353.1282, 825.5564, 2395.9355, 957.11084, 1572.2885, 172.6405, 1371.6602, 380.7088, 706.1122, 298.73926]
2025-05-13 13:52:18,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [428.0, 243.0, 784.0, 275.0, 502.0, 82.0, 388.0, 153.0, 249.0, 129.0]
2025-05-13 13:52:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 38 minutes, 59 seconds)
2025-05-13 13:55:34,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:55:39,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1246.51794 ± 1000.621
2025-05-13 13:55:39,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1643.3372, 541.58813, 340.76804, 2821.5457, 432.0717, 3183.7542, 1426.0302, 973.0687, 1084.2642, 18.75173]
2025-05-13 13:55:39,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [493.0, 201.0, 139.0, 890.0, 159.0, 1000.0, 409.0, 281.0, 339.0, 19.0]
2025-05-13 13:55:39,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 36 minutes, 13 seconds)
2025-05-13 13:58:42,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:58:46,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 948.14551 ± 889.690
2025-05-13 13:58:46,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [312.51227, 348.37634, 373.69263, 716.82935, 2798.9663, 1055.9735, 392.53442, 873.3339, 2479.909, 129.32744]
2025-05-13 13:58:46,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 137.0, 150.0, 224.0, 893.0, 350.0, 152.0, 297.0, 803.0, 69.0]
2025-05-13 13:58:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 33 seconds)
2025-05-13 14:02:00,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:02:06,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1324.55066 ± 965.733
2025-05-13 14:02:06,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1145.0995, 1549.4985, 217.71223, 571.0338, 2703.593, 1913.0619, 199.16113, 619.6717, 3171.337, 1155.338]
2025-05-13 14:02:06,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [358.0, 534.0, 96.0, 207.0, 850.0, 611.0, 90.0, 217.0, 1000.0, 327.0]
2025-05-13 14:02:06,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 20 seconds)
2025-05-13 14:05:14,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:05:19,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1169.31470 ± 875.854
2025-05-13 14:05:19,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [702.68665, 388.59686, 940.0358, 3173.8323, 2283.0247, 228.4402, 1486.7089, 704.4127, 1190.8553, 594.5534]
2025-05-13 14:05:19,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [229.0, 152.0, 322.0, 999.0, 743.0, 100.0, 492.0, 252.0, 411.0, 213.0]
2025-05-13 14:05:19,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 5 seconds)
2025-05-13 14:08:43,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:08:48,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1100.94568 ± 923.445
2025-05-13 14:08:48,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [436.59625, 1388.7235, 434.48953, 346.94223, 284.8777, 230.44981, 3188.9614, 956.85114, 1893.6863, 1847.8794]
2025-05-13 14:08:48,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 464.0, 165.0, 136.0, 117.0, 103.0, 1000.0, 284.0, 598.0, 535.0]
2025-05-13 14:08:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 6 seconds)
2025-05-13 14:11:48,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:11:53,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1087.28369 ± 849.357
2025-05-13 14:11:53,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [758.96405, 1369.0216, 171.48576, 729.5166, 1645.3135, 671.9274, 133.14589, 563.66187, 1756.642, 3073.158]
2025-05-13 14:11:53,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [270.0, 463.0, 81.0, 261.0, 560.0, 249.0, 66.0, 213.0, 593.0, 1000.0]
2025-05-13 14:11:53,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 28 seconds)
2025-05-13 14:15:01,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:15:07,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1398.22925 ± 1193.459
2025-05-13 14:15:07,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [866.0124, 3074.5132, 2135.5576, 146.54378, 1040.6941, 2871.132, 382.8546, 11.580342, 3063.31, 390.09396]
2025-05-13 14:15:07,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [293.0, 1000.0, 683.0, 72.0, 311.0, 893.0, 146.0, 15.0, 1000.0, 151.0]
2025-05-13 14:15:07,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 20 seconds)
2025-05-13 14:18:21,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:18:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1342.47253 ± 1029.501
2025-05-13 14:18:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [3064.0552, 438.9506, 804.8115, 1799.2043, 264.83423, 433.50302, 1709.1067, 3252.4329, 720.609, 937.2182]
2025-05-13 14:18:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [956.0, 168.0, 278.0, 581.0, 113.0, 171.0, 577.0, 1000.0, 263.0, 304.0]
2025-05-13 14:18:27,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 4 seconds)
2025-05-13 14:21:42,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:21:46,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 863.53143 ± 865.884
2025-05-13 14:21:46,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [2198.1787, 191.81693, 373.9139, 2196.1675, 1356.1755, 105.91974, 298.0882, 1780.1932, 118.38163, 16.478035]
2025-05-13 14:21:46,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [717.0, 87.0, 158.0, 712.0, 447.0, 56.0, 125.0, 576.0, 63.0, 16.0]
2025-05-13 14:21:46,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 52 seconds)
2025-05-13 14:24:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:24:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1243.19812 ± 1158.614
2025-05-13 14:24:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [937.16046, 1587.6674, 533.0544, 3145.9023, 2305.3982, 366.47507, 121.62266, 3122.9731, 25.53058, 286.19696]
2025-05-13 14:24:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [323.0, 523.0, 198.0, 1000.0, 750.0, 165.0, 66.0, 1000.0, 25.0, 125.0]
2025-05-13 14:24:59,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 28 seconds)
2025-05-13 14:28:09,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:28:14,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1111.76721 ± 1114.155
2025-05-13 14:28:14,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1553.3628, 208.89967, 2427.4993, 2851.8606, 25.165386, 388.2557, 404.36606, 130.52061, 2802.405, 325.33704]
2025-05-13 14:28:14,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [526.0, 97.0, 787.0, 919.0, 25.0, 147.0, 154.0, 64.0, 867.0, 126.0]
2025-05-13 14:28:14,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 16 seconds)
2025-05-13 14:31:30,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:31:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 1492.67651 ± 1139.664
2025-05-13 14:31:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [1143.1202, 588.8395, 2348.072, 3019.7659, 988.0487, 320.93747, 3205.6162, 330.89865, 250.97404, 2730.493]
2025-05-13 14:31:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [394.0, 216.0, 750.0, 1000.0, 339.0, 132.0, 1000.0, 136.0, 111.0, 867.0]
2025-05-13 14:31:36,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1251 [DEBUG]: Training session finished
