2025-05-13 09:06:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mda-mem16
2025-05-13 09:06:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mda-mem16
2025-05-13 09:06:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14c4d799e450>}
2025-05-13 09:06:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:54,367 baseline-bpql-mda-noisy-hopper:91 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-13 09:06:54,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-13 09:06:54,384 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-13 09:06:54,384 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:54,389 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2025-05-13 09:06:55,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:55,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:07,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:10:08,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 96.92034 ± 24.518
2025-05-13 09:10:08,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [92.483444, 111.55099, 105.68699, 103.77409, 122.39509, 134.34195, 59.81639, 62.815186, 66.822716, 109.5166]
2025-05-13 09:10:08,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 81.0, 80.0, 63.0, 75.0, 82.0, 47.0, 40.0, 54.0, 81.0]
2025-05-13 09:10:08,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (96.92) for latency MM1Queue_a033_s075
2025-05-13 09:10:08,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 18 minutes, 57 seconds)
2025-05-13 09:13:28,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:13:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 180.65512 ± 70.762
2025-05-13 09:13:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [144.52692, 118.85832, 150.30623, 270.2045, 265.6376, 227.27975, 288.6086, 132.784, 82.16918, 126.175995]
2025-05-13 09:13:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 147.0, 178.0, 299.0, 238.0, 256.0, 319.0, 160.0, 109.0, 156.0]
2025-05-13 09:13:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (180.66) for latency MM1Queue_a033_s075
2025-05-13 09:13:31,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 24 minutes, 5 seconds)
2025-05-13 09:16:53,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:16:55,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 147.43820 ± 57.953
2025-05-13 09:16:55,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [134.76518, 148.12515, 79.24899, 20.97121, 182.34465, 225.33139, 139.3579, 216.99658, 157.80844, 169.43248]
2025-05-13 09:16:55,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 117.0, 71.0, 24.0, 152.0, 189.0, 111.0, 190.0, 130.0, 140.0]
2025-05-13 09:16:55,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 23 minutes, 41 seconds)
2025-05-13 09:20:17,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:20:19,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 128.63612 ± 71.472
2025-05-13 09:20:19,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [70.98428, 105.8361, 190.79552, 82.510635, 56.66912, 268.4562, 67.20133, 63.399445, 211.40181, 169.10672]
2025-05-13 09:20:19,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 97.0, 170.0, 74.0, 60.0, 277.0, 56.0, 61.0, 192.0, 174.0]
2025-05-13 09:20:19,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 21 minutes, 45 seconds)
2025-05-13 09:23:38,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:23:40,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 113.07886 ± 30.532
2025-05-13 09:23:40,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [111.867035, 85.112785, 84.69089, 190.22894, 105.12081, 90.17132, 107.86598, 133.3489, 91.86312, 130.51878]
2025-05-13 09:23:40,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 67.0, 62.0, 97.0, 79.0, 70.0, 80.0, 84.0, 74.0, 84.0]
2025-05-13 09:23:40,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 18 minutes, 16 seconds)
2025-05-13 09:27:05,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:27:11,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 337.49380 ± 243.920
2025-05-13 09:27:11,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [60.194828, 190.5327, 766.9044, 596.8854, 210.65388, 368.70782, 84.19082, 692.812, 219.68544, 184.37059]
2025-05-13 09:27:11,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 197.0, 770.0, 596.0, 195.0, 355.0, 94.0, 668.0, 215.0, 178.0]
2025-05-13 09:27:11,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (337.49) for latency MM1Queue_a033_s075
2025-05-13 09:27:11,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 20 minutes, 27 seconds)
2025-05-13 09:30:30,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:30:37,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 297.96521 ± 247.723
2025-05-13 09:30:37,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [906.2586, 151.1115, 522.1586, 100.03232, 399.61923, 108.05991, 273.0338, 333.57257, 104.162155, 81.6434]
2025-05-13 09:30:37,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 183.0, 637.0, 152.0, 454.0, 124.0, 387.0, 386.0, 151.0, 128.0]
2025-05-13 09:30:37,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 17 minutes, 46 seconds)
2025-05-13 09:33:57,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:33:59,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 105.08469 ± 52.672
2025-05-13 09:33:59,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [21.835455, 98.864655, 92.95677, 108.068436, 92.316635, 107.497475, 91.73944, 107.76987, 83.93901, 245.85916]
2025-05-13 09:33:59,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 90.0, 79.0, 86.0, 79.0, 92.0, 79.0, 94.0, 73.0, 239.0]
2025-05-13 09:33:59,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 13 minutes, 55 seconds)
2025-05-13 09:37:21,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:37:24,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 217.60405 ± 114.945
2025-05-13 09:37:24,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [208.84544, 159.96722, 178.94478, 142.76915, 181.55357, 481.90558, 115.524506, 191.62201, 391.99658, 122.91161]
2025-05-13 09:37:24,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 135.0, 149.0, 113.0, 140.0, 459.0, 87.0, 176.0, 315.0, 89.0]
2025-05-13 09:37:24,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 10 minutes, 48 seconds)
2025-05-13 09:40:46,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:40:48,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 187.16000 ± 45.131
2025-05-13 09:40:48,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [159.71332, 140.31755, 244.29854, 170.65497, 264.51617, 169.60895, 156.03282, 140.10269, 174.1015, 252.25352]
2025-05-13 09:40:48,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 113.0, 197.0, 131.0, 227.0, 134.0, 128.0, 113.0, 136.0, 216.0]
2025-05-13 09:40:48,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 8 minutes, 38 seconds)
2025-05-13 09:44:11,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:44:14,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 217.30881 ± 95.337
2025-05-13 09:44:14,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [128.674, 197.2213, 383.73837, 134.43938, 164.59718, 145.90132, 147.3856, 207.10335, 398.64868, 265.37912]
2025-05-13 09:44:14,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 159.0, 342.0, 106.0, 124.0, 114.0, 109.0, 169.0, 391.0, 226.0]
2025-05-13 09:44:14,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 3 minutes, 31 seconds)
2025-05-13 09:47:35,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:47:36,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 112.46625 ± 19.524
2025-05-13 09:47:36,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [109.21516, 69.528625, 108.273254, 92.797714, 122.97963, 131.809, 124.828445, 142.55142, 113.35005, 109.329185]
2025-05-13 09:47:36,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 54.0, 68.0, 66.0, 72.0, 96.0, 71.0, 98.0, 66.0, 70.0]
2025-05-13 09:47:36,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 58 minutes, 57 seconds)
2025-05-13 09:50:57,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:50:58,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 158.38483 ± 44.572
2025-05-13 09:50:58,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [237.34543, 88.13543, 165.17522, 170.14511, 216.71104, 143.29369, 119.814545, 116.596466, 135.03166, 191.59978]
2025-05-13 09:50:58,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 63.0, 97.0, 99.0, 103.0, 83.0, 86.0, 71.0, 81.0, 118.0]
2025-05-13 09:50:58,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 55 minutes, 39 seconds)
2025-05-13 09:54:22,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:54:24,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 136.15335 ± 22.484
2025-05-13 09:54:24,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [127.68018, 152.64723, 151.11722, 115.7479, 148.78143, 144.60089, 128.12437, 99.80797, 113.046265, 179.98006]
2025-05-13 09:54:24,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 118.0, 119.0, 90.0, 116.0, 111.0, 93.0, 71.0, 80.0, 131.0]
2025-05-13 09:54:24,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 52 minutes, 25 seconds)
2025-05-13 09:57:44,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:57:46,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 167.67499 ± 53.329
2025-05-13 09:57:46,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [249.57986, 165.21191, 199.87926, 111.25477, 129.83258, 145.98975, 157.92575, 197.79395, 244.70363, 74.578445]
2025-05-13 09:57:46,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 97.0, 117.0, 74.0, 79.0, 89.0, 97.0, 107.0, 124.0, 46.0]
2025-05-13 09:57:46,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 48 minutes, 10 seconds)
2025-05-13 10:01:06,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:01:08,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 223.47144 ± 42.824
2025-05-13 10:01:08,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [187.30539, 270.8659, 195.96693, 220.92883, 167.25888, 272.27884, 176.68033, 297.69202, 203.11024, 242.62704]
2025-05-13 10:01:08,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 122.0, 108.0, 107.0, 91.0, 121.0, 94.0, 128.0, 97.0, 116.0]
2025-05-13 10:01:08,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 43 minutes, 54 seconds)
2025-05-13 10:04:29,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:04:30,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 165.86917 ± 35.220
2025-05-13 10:04:30,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [140.25346, 98.33351, 191.91168, 212.70325, 207.61322, 156.96568, 196.67671, 144.97766, 175.31195, 133.94467]
2025-05-13 10:04:30,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 68.0, 115.0, 116.0, 119.0, 102.0, 119.0, 87.0, 108.0, 83.0]
2025-05-13 10:04:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 40 minutes, 41 seconds)
2025-05-13 10:07:49,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:07:51,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 164.76671 ± 43.964
2025-05-13 10:07:51,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [74.283035, 161.74553, 170.30542, 117.36998, 191.49265, 167.38542, 178.50404, 252.8786, 157.55577, 176.14671]
2025-05-13 10:07:51,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 90.0, 96.0, 68.0, 95.0, 95.0, 100.0, 115.0, 91.0, 93.0]
2025-05-13 10:07:51,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 36 minutes, 40 seconds)
2025-05-13 10:11:12,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:11:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 150.00993 ± 27.955
2025-05-13 10:11:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [181.24214, 168.09108, 184.71992, 134.4166, 137.69803, 106.425385, 120.51668, 180.00363, 167.56386, 119.422035]
2025-05-13 10:11:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 102.0, 102.0, 84.0, 86.0, 68.0, 75.0, 101.0, 97.0, 73.0]
2025-05-13 10:11:13,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 32 minutes, 30 seconds)
2025-05-13 10:14:35,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:14:37,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 172.82541 ± 57.765
2025-05-13 10:14:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [112.1756, 278.92572, 133.9216, 110.35659, 175.71672, 268.06912, 121.37599, 202.06125, 151.98758, 173.66399]
2025-05-13 10:14:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 142.0, 80.0, 68.0, 93.0, 142.0, 73.0, 111.0, 85.0, 96.0]
2025-05-13 10:14:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 29 minutes, 40 seconds)
2025-05-13 10:17:57,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:17:59,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 216.49234 ± 43.615
2025-05-13 10:17:59,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [225.34705, 208.98407, 220.85991, 298.65436, 228.97693, 189.61934, 224.45872, 232.96658, 112.25412, 222.80219]
2025-05-13 10:17:59,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 100.0, 111.0, 130.0, 108.0, 93.0, 107.0, 112.0, 68.0, 112.0]
2025-05-13 10:17:59,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 26 minutes, 18 seconds)
2025-05-13 10:21:20,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:21:22,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 148.74014 ± 52.543
2025-05-13 10:21:22,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [108.042885, 85.96789, 212.78998, 163.1728, 109.89404, 148.67204, 113.31017, 190.60391, 101.53804, 253.40968]
2025-05-13 10:21:22,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 61.0, 109.0, 87.0, 70.0, 86.0, 68.0, 94.0, 62.0, 130.0]
2025-05-13 10:21:22,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 22 minutes, 57 seconds)
2025-05-13 10:24:43,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:24:45,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 198.15097 ± 60.701
2025-05-13 10:24:45,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [129.64977, 246.02951, 123.281395, 245.55046, 245.4242, 239.7627, 262.33167, 124.20757, 119.49032, 245.78204]
2025-05-13 10:24:45,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 116.0, 78.0, 111.0, 112.0, 119.0, 120.0, 78.0, 72.0, 115.0]
2025-05-13 10:24:45,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 20 minutes, 13 seconds)
2025-05-13 10:28:05,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:28:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 223.64767 ± 95.086
2025-05-13 10:28:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [144.29323, 86.4802, 250.44188, 277.78516, 115.19204, 258.55692, 399.09583, 233.1954, 329.27637, 142.15977]
2025-05-13 10:28:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 59.0, 112.0, 132.0, 67.0, 122.0, 163.0, 106.0, 140.0, 84.0]
2025-05-13 10:28:07,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 16 minutes, 47 seconds)
2025-05-13 10:31:28,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:31:30,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 199.50969 ± 67.604
2025-05-13 10:31:30,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [227.21536, 321.76373, 132.42395, 235.00621, 272.7015, 201.8613, 120.48911, 234.08917, 113.47863, 136.06793]
2025-05-13 10:31:30,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 137.0, 78.0, 107.0, 116.0, 101.0, 70.0, 108.0, 83.0, 81.0]
2025-05-13 10:31:30,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 13 minutes, 12 seconds)
2025-05-13 10:34:50,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:34:52,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 217.42314 ± 64.464
2025-05-13 10:34:52,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [279.63245, 196.68808, 55.63286, 198.75146, 199.04343, 284.42468, 193.96368, 263.98633, 272.54813, 229.56042]
2025-05-13 10:34:52,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 96.0, 50.0, 93.0, 90.0, 129.0, 99.0, 115.0, 121.0, 112.0]
2025-05-13 10:34:52,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 9 minutes, 47 seconds)
2025-05-13 10:38:14,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:38:16,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 227.44475 ± 83.463
2025-05-13 10:38:16,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [198.54395, 215.53935, 185.53503, 294.15836, 133.98923, 335.09802, 130.0306, 188.43767, 400.45538, 192.65999]
2025-05-13 10:38:16,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 117.0, 103.0, 144.0, 77.0, 168.0, 80.0, 97.0, 164.0, 95.0]
2025-05-13 10:38:16,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 6 minutes, 49 seconds)
2025-05-13 10:41:36,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:41:38,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 224.21782 ± 39.975
2025-05-13 10:41:38,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [212.6771, 206.29819, 245.18755, 318.8701, 192.63602, 257.51965, 174.88176, 191.23553, 205.89108, 236.98128]
2025-05-13 10:41:38,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 109.0, 108.0, 128.0, 94.0, 113.0, 89.0, 92.0, 109.0, 112.0]
2025-05-13 10:41:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 3 minutes, 11 seconds)
2025-05-13 10:44:58,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:44:59,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 187.96091 ± 55.381
2025-05-13 10:44:59,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [154.87636, 246.52002, 280.5183, 166.62396, 179.70314, 187.45183, 129.44409, 104.84805, 266.20364, 163.41972]
2025-05-13 10:44:59,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 136.0, 124.0, 92.0, 92.0, 109.0, 74.0, 71.0, 118.0, 98.0]
2025-05-13 10:44:59,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 59 minutes, 39 seconds)
2025-05-13 10:48:21,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:48:22,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 154.10585 ± 41.127
2025-05-13 10:48:22,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [105.52315, 218.02133, 158.96945, 148.76848, 161.34056, 104.0009, 133.9605, 197.3922, 209.27844, 103.80348]
2025-05-13 10:48:22,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 117.0, 87.0, 83.0, 90.0, 67.0, 77.0, 103.0, 107.0, 67.0]
2025-05-13 10:48:22,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 56 minutes, 14 seconds)
2025-05-13 10:51:46,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:51:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 209.04141 ± 56.360
2025-05-13 10:51:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [254.02284, 111.41651, 238.95552, 240.87392, 189.9247, 158.59784, 203.73532, 178.70822, 186.99416, 327.185]
2025-05-13 10:51:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 71.0, 123.0, 117.0, 101.0, 87.0, 99.0, 96.0, 101.0, 149.0]
2025-05-13 10:51:48,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 53 minutes, 44 seconds)
2025-05-13 10:55:06,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:55:08,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 223.96431 ± 49.007
2025-05-13 10:55:08,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [296.8504, 274.4381, 266.81772, 165.76581, 177.39119, 273.48813, 237.13358, 199.96109, 190.47003, 157.32695]
2025-05-13 10:55:08,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 133.0, 133.0, 93.0, 106.0, 132.0, 127.0, 114.0, 104.0, 90.0]
2025-05-13 10:55:08,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 49 minutes, 27 seconds)
2025-05-13 10:58:31,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:58:33,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 239.77084 ± 72.669
2025-05-13 10:58:33,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [157.3555, 324.32858, 153.03519, 155.51526, 243.4259, 241.94081, 362.97998, 244.77628, 322.03912, 192.3119]
2025-05-13 10:58:33,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 153.0, 95.0, 88.0, 133.0, 129.0, 187.0, 121.0, 165.0, 100.0]
2025-05-13 10:58:33,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 46 minutes, 37 seconds)
2025-05-13 11:01:55,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:01:57,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 297.96259 ± 64.976
2025-05-13 11:01:57,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [269.73743, 279.28183, 324.5075, 287.55713, 275.23093, 343.49167, 309.5187, 180.2024, 447.65604, 262.44223]
2025-05-13 11:01:57,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 116.0, 137.0, 135.0, 122.0, 153.0, 138.0, 101.0, 176.0, 128.0]
2025-05-13 11:01:57,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 43 minutes, 52 seconds)
2025-05-13 11:05:17,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:05:19,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 233.17944 ± 59.024
2025-05-13 11:05:19,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [209.23625, 315.6106, 168.29091, 200.81264, 215.77354, 178.0424, 297.13126, 215.48747, 187.1232, 344.28625]
2025-05-13 11:05:19,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 138.0, 91.0, 100.0, 107.0, 97.0, 195.0, 116.0, 98.0, 169.0]
2025-05-13 11:05:19,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 40 minutes, 21 seconds)
2025-05-13 11:08:41,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:08:42,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 182.48738 ± 62.904
2025-05-13 11:08:42,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [70.29189, 141.76869, 212.2957, 294.17715, 146.19106, 237.90326, 142.28238, 232.85432, 133.191, 213.91837]
2025-05-13 11:08:42,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 81.0, 107.0, 138.0, 89.0, 121.0, 87.0, 121.0, 80.0, 106.0]
2025-05-13 11:08:42,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 36 minutes, 24 seconds)
2025-05-13 11:12:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:12:05,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 193.39731 ± 38.280
2025-05-13 11:12:05,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [238.09224, 160.8823, 165.0219, 162.16112, 205.16351, 230.91544, 198.17365, 201.22559, 249.74176, 122.59564]
2025-05-13 11:12:05,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 87.0, 92.0, 92.0, 118.0, 113.0, 102.0, 106.0, 118.0, 71.0]
2025-05-13 11:12:05,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 33 minutes, 34 seconds)
2025-05-13 11:15:28,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:15:30,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 251.08244 ± 38.678
2025-05-13 11:15:30,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [270.926, 249.20955, 291.93503, 205.7493, 212.11702, 253.47725, 218.3638, 315.94427, 201.85257, 291.24966]
2025-05-13 11:15:30,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 129.0, 147.0, 116.0, 114.0, 127.0, 120.0, 152.0, 111.0, 133.0]
2025-05-13 11:15:30,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 30 minutes, 14 seconds)
2025-05-13 11:18:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:18:54,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 339.15219 ± 106.985
2025-05-13 11:18:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [359.70163, 191.442, 266.89822, 479.3611, 569.4062, 378.72324, 289.41486, 292.75955, 253.84334, 309.97195]
2025-05-13 11:18:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 104.0, 138.0, 194.0, 220.0, 170.0, 136.0, 154.0, 133.0, 148.0]
2025-05-13 11:18:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (339.15) for latency MM1Queue_a033_s075
2025-05-13 11:18:54,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 26 minutes, 48 seconds)
2025-05-13 11:22:15,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:22:18,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 288.15723 ± 73.981
2025-05-13 11:22:18,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [297.1614, 219.60066, 297.73364, 301.3903, 350.04782, 420.5311, 134.31703, 233.70088, 327.84158, 299.24777]
2025-05-13 11:22:18,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 113.0, 141.0, 150.0, 173.0, 185.0, 82.0, 121.0, 146.0, 146.0]
2025-05-13 11:22:18,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 23 minutes, 41 seconds)
2025-05-13 11:25:38,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:25:40,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 229.29221 ± 115.016
2025-05-13 11:25:40,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [157.57324, 349.45752, 156.92905, 174.4905, 180.18625, 483.209, 145.60617, 129.36351, 160.75348, 355.35345]
2025-05-13 11:25:40,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 165.0, 90.0, 99.0, 97.0, 191.0, 83.0, 79.0, 88.0, 163.0]
2025-05-13 11:25:40,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 20 minutes, 2 seconds)
2025-05-13 11:29:01,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:29:03,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 196.54872 ± 55.784
2025-05-13 11:29:03,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [171.70067, 100.2172, 175.95949, 307.6566, 238.7065, 209.99223, 209.82368, 244.38731, 166.30931, 140.73427]
2025-05-13 11:29:03,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 65.0, 96.0, 141.0, 119.0, 112.0, 124.0, 126.0, 91.0, 80.0]
2025-05-13 11:29:03,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 16 minutes, 43 seconds)
2025-05-13 11:32:24,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:32:27,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 316.53595 ± 135.490
2025-05-13 11:32:27,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [279.32483, 582.70514, 307.69342, 369.2942, 397.1327, 406.63635, 114.80547, 101.67683, 360.69153, 245.39902]
2025-05-13 11:32:27,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 227.0, 141.0, 165.0, 184.0, 177.0, 68.0, 62.0, 163.0, 127.0]
2025-05-13 11:32:27,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 13 minutes, 12 seconds)
2025-05-13 11:35:48,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:35:49,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 206.64812 ± 57.433
2025-05-13 11:35:49,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [217.533, 167.92995, 176.30151, 327.9436, 124.10265, 242.38951, 139.14746, 206.62582, 202.45346, 262.05426]
2025-05-13 11:35:49,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 94.0, 101.0, 153.0, 73.0, 121.0, 81.0, 102.0, 98.0, 120.0]
2025-05-13 11:35:49,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 9 minutes, 30 seconds)
2025-05-13 11:39:11,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:39:13,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 315.35800 ± 121.275
2025-05-13 11:39:13,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [256.33594, 357.8797, 376.19855, 220.41049, 610.4161, 225.0425, 376.2381, 268.88986, 314.09552, 148.07338]
2025-05-13 11:39:13,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 145.0, 154.0, 119.0, 225.0, 113.0, 144.0, 117.0, 145.0, 90.0]
2025-05-13 11:39:13,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 6 minutes, 6 seconds)
2025-05-13 11:42:35,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:42:38,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 327.46710 ± 62.486
2025-05-13 11:42:38,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [368.478, 334.36884, 340.94293, 352.4585, 350.75815, 426.64508, 326.54944, 331.83984, 179.04427, 263.58618]
2025-05-13 11:42:38,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 146.0, 158.0, 153.0, 167.0, 180.0, 139.0, 150.0, 92.0, 126.0]
2025-05-13 11:42:38,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 3 minutes, 15 seconds)
2025-05-13 11:45:58,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:46:00,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 246.37265 ± 104.913
2025-05-13 11:46:00,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [137.68404, 405.01523, 226.394, 186.1611, 198.92798, 271.09485, 298.49097, 142.47987, 145.10678, 452.37158]
2025-05-13 11:46:00,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 172.0, 107.0, 98.0, 101.0, 120.0, 125.0, 79.0, 86.0, 173.0]
2025-05-13 11:46:00,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 59 minutes, 44 seconds)
2025-05-13 11:49:21,522 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:49:23,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 310.46866 ± 73.468
2025-05-13 11:49:23,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [366.72702, 165.33896, 417.2088, 336.0674, 346.69992, 278.9181, 316.49097, 210.75333, 380.01105, 286.471]
2025-05-13 11:49:23,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 95.0, 165.0, 151.0, 144.0, 119.0, 144.0, 108.0, 155.0, 118.0]
2025-05-13 11:49:23,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 56 minutes, 11 seconds)
2025-05-13 11:52:47,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:52:49,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 256.73923 ± 94.932
2025-05-13 11:52:49,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [157.60202, 386.56152, 90.16244, 234.89363, 300.10602, 340.21262, 340.09134, 266.4793, 132.39978, 318.8836]
2025-05-13 11:52:49,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 171.0, 60.0, 114.0, 130.0, 149.0, 153.0, 130.0, 82.0, 142.0]
2025-05-13 11:52:49,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 53 minutes, 19 seconds)
2025-05-13 11:56:10,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:56:12,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 260.49728 ± 81.519
2025-05-13 11:56:12,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [339.84738, 129.82787, 281.71594, 146.71126, 311.15817, 324.5712, 145.25629, 267.697, 340.69955, 317.48828]
2025-05-13 11:56:12,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 77.0, 131.0, 84.0, 147.0, 150.0, 85.0, 125.0, 145.0, 144.0]
2025-05-13 11:56:12,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 49 minutes, 54 seconds)
2025-05-13 11:59:34,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:59:36,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 190.66701 ± 49.454
2025-05-13 11:59:36,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [158.32277, 189.53317, 169.93048, 179.92043, 196.31122, 182.88017, 333.34158, 183.8839, 154.86375, 157.68251]
2025-05-13 11:59:36,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 101.0, 93.0, 97.0, 115.0, 98.0, 145.0, 101.0, 90.0, 84.0]
2025-05-13 11:59:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 46 minutes, 18 seconds)
2025-05-13 12:02:54,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:02:56,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 230.83980 ± 80.586
2025-05-13 12:02:56,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [232.11617, 311.9457, 191.92436, 296.82175, 193.97824, 268.60797, 389.77148, 154.61674, 133.26704, 135.34851]
2025-05-13 12:02:56,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 137.0, 101.0, 130.0, 101.0, 122.0, 168.0, 89.0, 79.0, 78.0]
2025-05-13 12:02:56,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 42 minutes, 32 seconds)
2025-05-13 12:06:19,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:06:21,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 288.33563 ± 74.029
2025-05-13 12:06:21,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [246.3291, 276.78687, 329.39948, 262.05612, 409.07068, 372.67624, 358.5063, 162.39009, 264.2436, 201.89801]
2025-05-13 12:06:21,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 125.0, 127.0, 116.0, 167.0, 154.0, 141.0, 89.0, 113.0, 102.0]
2025-05-13 12:06:21,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 39 minutes, 30 seconds)
2025-05-13 12:09:42,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:09:44,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 166.29756 ± 71.292
2025-05-13 12:09:44,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [137.95918, 113.10569, 111.74015, 109.816284, 150.81306, 170.60219, 194.87062, 315.69492, 272.19028, 86.18313]
2025-05-13 12:09:44,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 80.0, 70.0, 78.0, 87.0, 88.0, 103.0, 141.0, 125.0, 60.0]
2025-05-13 12:09:44,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 35 minutes, 39 seconds)
2025-05-13 12:13:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:13:07,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 246.34148 ± 85.977
2025-05-13 12:13:07,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [360.5857, 137.59958, 103.84595, 229.76367, 205.81815, 307.5711, 356.18826, 289.12137, 169.44812, 303.4729]
2025-05-13 12:13:07,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 81.0, 73.0, 116.0, 104.0, 148.0, 155.0, 137.0, 96.0, 148.0]
2025-05-13 12:13:07,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 32 minutes, 9 seconds)
2025-05-13 12:16:28,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:16:30,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 291.71790 ± 68.566
2025-05-13 12:16:30,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [332.1045, 272.799, 296.47137, 301.4683, 350.3667, 371.02197, 146.94104, 195.1909, 361.67148, 289.14377]
2025-05-13 12:16:30,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 114.0, 124.0, 136.0, 148.0, 152.0, 83.0, 103.0, 141.0, 124.0]
2025-05-13 12:16:30,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 28 minutes, 41 seconds)
2025-05-13 12:19:50,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:19:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 260.72787 ± 121.851
2025-05-13 12:19:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [207.72505, 359.4311, 156.49385, 304.48514, 101.74537, 353.9628, 500.58957, 219.70316, 308.71933, 94.42347]
2025-05-13 12:19:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 150.0, 89.0, 136.0, 67.0, 163.0, 219.0, 110.0, 131.0, 63.0]
2025-05-13 12:19:52,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 25 minutes, 34 seconds)
2025-05-13 12:23:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:23:14,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 337.55695 ± 103.047
2025-05-13 12:23:14,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [376.0343, 309.4019, 248.52417, 592.6887, 347.3661, 253.78078, 199.73953, 359.32642, 298.41586, 390.29193]
2025-05-13 12:23:14,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 145.0, 132.0, 213.0, 170.0, 119.0, 101.0, 157.0, 137.0, 167.0]
2025-05-13 12:23:14,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 21 minutes, 47 seconds)
2025-05-13 12:26:35,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:26:37,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 312.54776 ± 131.529
2025-05-13 12:26:37,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [527.5684, 344.32007, 141.5194, 372.56522, 194.26044, 291.8019, 259.60367, 148.79712, 533.5974, 311.4438]
2025-05-13 12:26:37,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [218.0, 149.0, 84.0, 154.0, 97.0, 141.0, 120.0, 84.0, 215.0, 136.0]
2025-05-13 12:26:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 18 minutes, 25 seconds)
2025-05-13 12:29:55,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:29:57,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 318.16180 ± 79.713
2025-05-13 12:29:57,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [191.35303, 455.00345, 337.8595, 336.89377, 284.8587, 303.4896, 298.19244, 263.54614, 253.85808, 456.56323]
2025-05-13 12:29:57,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 177.0, 138.0, 138.0, 133.0, 137.0, 134.0, 129.0, 137.0, 184.0]
2025-05-13 12:29:57,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 14 minutes, 45 seconds)
2025-05-13 12:33:20,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:33:22,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 332.81931 ± 113.136
2025-05-13 12:33:22,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [584.3319, 238.3834, 409.36014, 250.97603, 265.4715, 402.68195, 310.53525, 311.17804, 393.38104, 161.89372]
2025-05-13 12:33:22,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 122.0, 172.0, 128.0, 149.0, 190.0, 143.0, 144.0, 176.0, 89.0]
2025-05-13 12:33:22,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 11 minutes, 34 seconds)
2025-05-13 12:36:38,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:36:40,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 364.99527 ± 145.309
2025-05-13 12:36:40,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [508.25757, 430.36603, 339.07208, 349.7698, 229.07883, 169.05342, 414.18045, 622.91284, 450.87708, 136.38431]
2025-05-13 12:36:40,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 178.0, 148.0, 150.0, 114.0, 92.0, 168.0, 230.0, 190.0, 86.0]
2025-05-13 12:36:40,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (365.00) for latency MM1Queue_a033_s075
2025-05-13 12:36:40,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 7 minutes, 45 seconds)
2025-05-13 12:40:01,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:40:03,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 317.10257 ± 84.143
2025-05-13 12:40:03,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [252.97821, 392.81183, 290.42572, 194.65024, 436.6996, 329.3068, 362.9307, 402.3564, 172.04832, 336.81787]
2025-05-13 12:40:03,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 168.0, 140.0, 93.0, 183.0, 143.0, 154.0, 161.0, 98.0, 140.0]
2025-05-13 12:40:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 4 minutes, 25 seconds)
2025-05-13 12:43:22,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:43:24,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 294.97406 ± 160.569
2025-05-13 12:43:24,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [342.13715, 376.58673, 75.57714, 386.02106, 172.24738, 607.0888, 97.99149, 398.20117, 133.00531, 360.8842]
2025-05-13 12:43:24,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 154.0, 53.0, 158.0, 91.0, 233.0, 63.0, 164.0, 74.0, 149.0]
2025-05-13 12:43:24,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 49 seconds)
2025-05-13 12:46:41,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:46:44,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 346.65143 ± 76.491
2025-05-13 12:46:44,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [351.023, 386.99634, 242.5896, 525.73303, 241.2005, 357.3939, 340.78415, 367.79196, 299.7244, 353.27725]
2025-05-13 12:46:44,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 168.0, 117.0, 207.0, 112.0, 145.0, 147.0, 152.0, 126.0, 151.0]
2025-05-13 12:46:44,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 57 minutes, 24 seconds)
2025-05-13 12:50:04,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:50:06,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 326.21649 ± 132.002
2025-05-13 12:50:06,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [178.60951, 402.55304, 263.53516, 235.27501, 184.24158, 196.05614, 410.07782, 368.4144, 610.9716, 412.4304]
2025-05-13 12:50:06,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 180.0, 133.0, 116.0, 97.0, 101.0, 169.0, 150.0, 224.0, 176.0]
2025-05-13 12:50:06,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 53 minutes, 46 seconds)
2025-05-13 12:53:24,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:53:26,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 249.14705 ± 146.808
2025-05-13 12:53:26,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [149.59692, 631.0889, 256.5991, 198.87946, 166.20967, 384.43225, 264.9751, 153.99287, 148.78459, 136.91136]
2025-05-13 12:53:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 237.0, 122.0, 101.0, 93.0, 160.0, 121.0, 85.0, 82.0, 78.0]
2025-05-13 12:53:26,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 50 minutes, 37 seconds)
2025-05-13 12:56:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:56:50,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 277.05521 ± 108.134
2025-05-13 12:56:50,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [379.14835, 283.39706, 79.57533, 382.3348, 186.66934, 361.14325, 171.36903, 175.02719, 377.74875, 374.13898]
2025-05-13 12:56:50,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 126.0, 55.0, 173.0, 101.0, 155.0, 94.0, 101.0, 156.0, 165.0]
2025-05-13 12:56:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 47 minutes, 25 seconds)
2025-05-13 13:00:06,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:00:09,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 423.11752 ± 141.241
2025-05-13 13:00:09,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [660.24194, 416.838, 303.13535, 397.90198, 151.09799, 537.53827, 356.99597, 607.69806, 427.97336, 371.75412]
2025-05-13 13:00:09,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 194.0, 142.0, 168.0, 87.0, 222.0, 158.0, 228.0, 183.0, 152.0]
2025-05-13 13:00:09,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (423.12) for latency MM1Queue_a033_s075
2025-05-13 13:00:09,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 43 minutes, 54 seconds)
2025-05-13 13:03:33,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:03:35,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 257.05402 ± 88.901
2025-05-13 13:03:35,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [243.53055, 107.80003, 354.0062, 291.7259, 291.91553, 313.49896, 352.79803, 331.0336, 158.07103, 126.160515]
2025-05-13 13:03:35,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 68.0, 158.0, 142.0, 144.0, 153.0, 160.0, 156.0, 85.0, 73.0]
2025-05-13 13:03:35,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 41 minutes, 10 seconds)
2025-05-13 13:06:53,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:06:55,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 226.65215 ± 69.622
2025-05-13 13:06:55,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [212.04147, 319.29953, 192.81789, 185.41687, 185.74948, 349.6988, 99.99323, 274.69678, 252.07799, 194.7295]
2025-05-13 13:06:55,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 145.0, 104.0, 100.0, 95.0, 158.0, 63.0, 135.0, 122.0, 106.0]
2025-05-13 13:06:55,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 37 minutes, 34 seconds)
2025-05-13 13:10:12,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:10:15,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 333.31491 ± 145.521
2025-05-13 13:10:15,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [362.45236, 559.1349, 382.9726, 488.62598, 106.41149, 303.40204, 472.2634, 163.68771, 152.21469, 341.98404]
2025-05-13 13:10:15,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 228.0, 168.0, 201.0, 65.0, 141.0, 202.0, 90.0, 87.0, 157.0]
2025-05-13 13:10:15,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 34 minutes, 8 seconds)
2025-05-13 13:13:32,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:13:34,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 338.50589 ± 107.402
2025-05-13 13:13:34,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [345.36417, 327.9259, 592.3076, 342.83105, 301.97107, 341.3463, 402.46387, 354.05103, 185.19975, 191.59824]
2025-05-13 13:13:34,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [154.0, 146.0, 230.0, 147.0, 137.0, 147.0, 169.0, 157.0, 100.0, 99.0]
2025-05-13 13:13:34,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 30 minutes, 22 seconds)
2025-05-13 13:16:56,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:16:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 315.64178 ± 83.475
2025-05-13 13:16:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [264.92102, 389.9525, 178.45036, 330.7339, 220.21521, 341.14484, 261.76517, 401.11728, 465.33084, 302.7869]
2025-05-13 13:16:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 173.0, 96.0, 147.0, 118.0, 158.0, 124.0, 170.0, 190.0, 138.0]
2025-05-13 13:16:59,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 27 minutes, 29 seconds)
2025-05-13 13:20:18,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:20:20,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 276.88571 ± 84.135
2025-05-13 13:20:20,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [186.13039, 387.30463, 357.36105, 145.35568, 163.26183, 258.2709, 302.49567, 310.61926, 273.83313, 384.22495]
2025-05-13 13:20:20,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 166.0, 157.0, 86.0, 89.0, 129.0, 140.0, 143.0, 135.0, 163.0]
2025-05-13 13:20:20,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 23 minutes, 42 seconds)
2025-05-13 13:23:40,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:23:43,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 363.64670 ± 96.688
2025-05-13 13:23:43,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [379.94183, 255.93285, 520.5598, 397.13303, 327.49075, 501.50598, 378.58105, 396.6318, 198.41762, 280.272]
2025-05-13 13:23:43,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 124.0, 215.0, 171.0, 150.0, 216.0, 159.0, 170.0, 102.0, 137.0]
2025-05-13 13:23:43,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 20 minutes, 35 seconds)
2025-05-13 13:27:00,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:27:03,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 397.46658 ± 96.514
2025-05-13 13:27:03,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [229.75125, 438.78107, 538.03314, 230.79659, 414.33594, 470.66266, 365.59827, 455.60162, 465.9716, 365.13354]
2025-05-13 13:27:03,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 196.0, 230.0, 126.0, 185.0, 207.0, 163.0, 200.0, 201.0, 155.0]
2025-05-13 13:27:03,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 17 minutes, 19 seconds)
2025-05-13 13:30:22,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:30:24,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 396.47284 ± 105.394
2025-05-13 13:30:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [383.46982, 364.87863, 168.96156, 348.9697, 526.6105, 521.9462, 421.16296, 354.24683, 530.77515, 343.70706]
2025-05-13 13:30:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 159.0, 87.0, 151.0, 212.0, 214.0, 181.0, 154.0, 213.0, 155.0]
2025-05-13 13:30:24,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 14 minutes, 4 seconds)
2025-05-13 13:33:45,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:33:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 388.27057 ± 134.880
2025-05-13 13:33:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [348.12506, 361.43793, 543.7582, 375.86417, 352.09973, 384.92557, 380.39325, 451.05475, 612.48065, 72.56612]
2025-05-13 13:33:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 161.0, 220.0, 163.0, 152.0, 166.0, 167.0, 187.0, 223.0, 49.0]
2025-05-13 13:33:48,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 10 minutes, 37 seconds)
2025-05-13 13:37:05,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:37:08,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 372.14996 ± 126.604
2025-05-13 13:37:08,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [399.12, 438.5678, 306.40335, 363.91232, 399.18082, 417.1055, 633.3716, 149.88828, 411.55002, 202.4002]
2025-05-13 13:37:08,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 185.0, 141.0, 164.0, 170.0, 182.0, 236.0, 87.0, 180.0, 105.0]
2025-05-13 13:37:08,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2025-05-13 13:40:27,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:40:30,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 325.36777 ± 74.255
2025-05-13 13:40:30,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [487.6338, 280.66537, 336.59958, 291.37766, 181.86629, 330.52652, 292.80804, 321.19952, 348.0434, 382.95752]
2025-05-13 13:40:30,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 132.0, 144.0, 136.0, 99.0, 147.0, 134.0, 148.0, 151.0, 165.0]
2025-05-13 13:40:30,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 3 minutes, 47 seconds)
2025-05-13 13:43:48,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:43:50,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 357.50473 ± 46.675
2025-05-13 13:43:50,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [350.98187, 329.4436, 372.5015, 400.3454, 310.24216, 367.4535, 471.54858, 329.16455, 331.35056, 312.01566]
2025-05-13 13:43:50,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [154.0, 148.0, 163.0, 170.0, 145.0, 165.0, 196.0, 151.0, 148.0, 142.0]
2025-05-13 13:43:50,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 25 seconds)
2025-05-13 13:47:12,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:47:14,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 335.16586 ± 137.740
2025-05-13 13:47:14,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [351.39145, 175.71318, 388.88327, 177.03674, 297.26285, 522.8953, 388.1645, 154.55513, 586.8549, 308.90143]
2025-05-13 13:47:14,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 93.0, 160.0, 103.0, 138.0, 211.0, 169.0, 84.0, 230.0, 139.0]
2025-05-13 13:47:14,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 57 minutes, 12 seconds)
2025-05-13 13:50:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:50:36,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 511.48700 ± 238.335
2025-05-13 13:50:36,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [311.18127, 169.02707, 506.73813, 647.2439, 629.9135, 300.55136, 557.64465, 541.1266, 1071.3094, 380.1344]
2025-05-13 13:50:36,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 94.0, 209.0, 246.0, 241.0, 145.0, 212.0, 206.0, 390.0, 166.0]
2025-05-13 13:50:36,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (511.49) for latency MM1Queue_a033_s075
2025-05-13 13:50:36,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 53 minutes, 48 seconds)
2025-05-13 13:53:55,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:53:57,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 381.59415 ± 91.424
2025-05-13 13:53:57,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [372.3678, 465.65634, 401.34436, 203.51671, 395.56573, 439.40274, 390.67303, 222.50359, 428.37204, 496.53918]
2025-05-13 13:53:57,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 185.0, 167.0, 114.0, 170.0, 179.0, 166.0, 113.0, 175.0, 197.0]
2025-05-13 13:53:57,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 50 minutes, 29 seconds)
2025-05-13 13:57:20,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:57:23,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 427.59784 ± 200.697
2025-05-13 13:57:23,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [376.64072, 149.81682, 427.04953, 618.5518, 378.62347, 151.4939, 883.186, 421.58548, 445.59103, 423.4395]
2025-05-13 13:57:23,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 79.0, 187.0, 243.0, 156.0, 83.0, 327.0, 197.0, 177.0, 173.0]
2025-05-13 13:57:23,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 47 minutes, 16 seconds)
2025-05-13 14:00:41,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:00:43,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 306.47610 ± 132.881
2025-05-13 14:00:43,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [192.92723, 474.4222, 171.49918, 256.59787, 577.40265, 415.9191, 316.90546, 227.45906, 158.37823, 273.25003]
2025-05-13 14:00:43,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 197.0, 93.0, 122.0, 220.0, 172.0, 140.0, 112.0, 85.0, 128.0]
2025-05-13 14:00:43,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 43 minutes, 52 seconds)
2025-05-13 14:04:01,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:04:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 351.47302 ± 151.533
2025-05-13 14:04:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [366.96365, 517.0486, 374.88544, 390.34155, 376.6786, 578.8854, 77.30362, 84.41981, 369.21783, 378.98575]
2025-05-13 14:04:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 197.0, 162.0, 163.0, 157.0, 218.0, 53.0, 60.0, 162.0, 164.0]
2025-05-13 14:04:03,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 40 minutes, 21 seconds)
2025-05-13 14:07:26,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:07:30,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 512.42938 ± 346.180
2025-05-13 14:07:30,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [751.91046, 146.58928, 1060.9218, 184.72685, 395.08774, 634.4488, 91.502975, 586.4216, 213.87045, 1058.8141]
2025-05-13 14:07:30,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [288.0, 84.0, 407.0, 101.0, 166.0, 236.0, 66.0, 232.0, 119.0, 397.0]
2025-05-13 14:07:30,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (512.43) for latency MM1Queue_a033_s075
2025-05-13 14:07:30,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 37 minutes, 9 seconds)
2025-05-13 14:10:46,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:10:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 363.69080 ± 50.998
2025-05-13 14:10:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [415.52484, 381.1658, 405.32516, 396.3089, 314.64465, 429.30078, 277.40695, 343.89404, 290.78275, 382.55396]
2025-05-13 14:10:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 158.0, 169.0, 164.0, 144.0, 179.0, 128.0, 147.0, 142.0, 162.0]
2025-05-13 14:10:49,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 33 minutes, 43 seconds)
2025-05-13 14:14:08,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:14:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 461.58868 ± 139.252
2025-05-13 14:14:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [587.74384, 589.597, 516.38025, 498.5433, 121.63388, 411.74692, 374.7878, 473.48688, 631.56476, 410.40173]
2025-05-13 14:14:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [214.0, 244.0, 199.0, 201.0, 71.0, 168.0, 159.0, 210.0, 229.0, 173.0]
2025-05-13 14:14:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 14 seconds)
2025-05-13 14:17:34,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:17:36,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 362.72382 ± 139.850
2025-05-13 14:17:36,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [480.04675, 600.0816, 338.57092, 110.3948, 347.33026, 149.78166, 484.17938, 391.66705, 357.38715, 367.7986]
2025-05-13 14:17:36,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 236.0, 148.0, 66.0, 153.0, 82.0, 199.0, 165.0, 149.0, 157.0]
2025-05-13 14:17:36,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 27 minutes, 1 second)
2025-05-13 14:20:59,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:21:01,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 425.52286 ± 180.584
2025-05-13 14:21:01,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [396.89273, 774.00256, 333.47955, 590.7613, 380.4256, 411.08194, 507.49933, 547.21454, 153.48549, 160.3853]
2025-05-13 14:21:01,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 262.0, 149.0, 231.0, 167.0, 181.0, 197.0, 223.0, 87.0, 87.0]
2025-05-13 14:21:01,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 45 seconds)
2025-05-13 14:24:13,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:24:15,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 299.13278 ± 124.944
2025-05-13 14:24:15,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [324.9397, 426.30658, 268.5995, 319.0255, 378.7793, 346.8971, 76.92843, 464.28815, 315.35037, 70.21336]
2025-05-13 14:24:15,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [142.0, 187.0, 131.0, 146.0, 164.0, 154.0, 53.0, 192.0, 147.0, 47.0]
2025-05-13 14:24:15,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 6 seconds)
2025-05-13 14:27:36,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:27:39,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 333.93182 ± 149.373
2025-05-13 14:27:39,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [531.4596, 260.1846, 131.5757, 372.39868, 171.28136, 444.86768, 573.7761, 367.01398, 349.34622, 137.4143]
2025-05-13 14:27:39,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [212.0, 121.0, 75.0, 145.0, 97.0, 180.0, 224.0, 161.0, 144.0, 77.0]
2025-05-13 14:27:39,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 49 seconds)
2025-05-13 14:30:58,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:31:01,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 413.46906 ± 194.725
2025-05-13 14:31:01,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [385.27863, 609.8094, 651.14246, 154.40884, 539.73486, 160.37296, 171.87788, 417.6068, 686.8352, 357.62317]
2025-05-13 14:31:01,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 222.0, 252.0, 85.0, 198.0, 96.0, 102.0, 171.0, 266.0, 158.0]
2025-05-13 14:31:01,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 28 seconds)
2025-05-13 14:34:21,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:34:24,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 457.02026 ± 146.457
2025-05-13 14:34:24,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [179.33789, 648.15814, 391.87665, 403.6706, 389.18466, 337.19724, 574.3698, 394.40018, 626.9682, 625.0394]
2025-05-13 14:34:24,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 254.0, 166.0, 169.0, 163.0, 148.0, 224.0, 163.0, 235.0, 243.0]
2025-05-13 14:34:24,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 4 seconds)
2025-05-13 14:37:42,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:37:45,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 447.64902 ± 226.262
2025-05-13 14:37:45,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [181.32143, 566.67236, 136.80225, 933.3702, 650.26556, 366.1411, 344.40045, 322.56464, 394.94794, 580.0044]
2025-05-13 14:37:45,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 218.0, 83.0, 331.0, 244.0, 156.0, 151.0, 148.0, 169.0, 231.0]
2025-05-13 14:37:45,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 41 seconds)
2025-05-13 14:41:07,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:41:10,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 400.93842 ± 172.929
2025-05-13 14:41:10,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [377.3811, 387.308, 163.56311, 412.57538, 363.1426, 144.2852, 366.32034, 605.3448, 426.75665, 762.70685]
2025-05-13 14:41:10,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 174.0, 93.0, 180.0, 165.0, 84.0, 160.0, 240.0, 180.0, 286.0]
2025-05-13 14:41:10,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 22 seconds)
2025-05-13 14:44:26,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:44:29,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 411.52206 ± 104.026
2025-05-13 14:44:29,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [486.49475, 376.0542, 481.3435, 376.4513, 505.95737, 349.14365, 376.65326, 418.4304, 569.9409, 174.75146]
2025-05-13 14:44:29,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 163.0, 193.0, 164.0, 211.0, 158.0, 159.0, 168.0, 221.0, 100.0]
2025-05-13 14:44:29,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1251 [DEBUG]: Training session finished
