2025-08-07 09:27:01,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:27:01,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:27:01,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14d1fb593fd0>}
2025-08-07 09:27:01,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 09:27:01,636 baseline-bpql-noiseperc25-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 09:27:01,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 09:27:01,651 baseline-bpql-noiseperc25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 09:27:01,652 baseline-bpql-noiseperc25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 09:27:02,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 09:27:02,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 09:28:38,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:28:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -305.81256 ± 32.321
2025-08-07 09:28:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-340.12747, -268.11856, -302.114, -287.28674, -263.7156, -321.03815, -359.80817, -328.48178, -264.01004, -323.4251]
2025-08-07 09:28:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:28:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-305.81) for latency MM1Queue_a033_s075
2025-08-07 09:28:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 57 minutes, 33 seconds)
2025-08-07 09:30:31,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:30:43,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -247.12509 ± 59.852
2025-08-07 09:30:43,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-268.9368, -96.56662, -316.6695, -231.90459, -249.12022, -214.47557, -281.12228, -227.74564, -270.92242, -313.78732]
2025-08-07 09:30:43,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:30:43,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-247.13) for latency MM1Queue_a033_s075
2025-08-07 09:30:43,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 17 seconds)
2025-08-07 09:32:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:32:36,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -248.45187 ± 39.006
2025-08-07 09:32:36,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-212.73944, -174.63034, -241.52899, -225.32579, -277.14844, -302.93967, -272.9692, -304.93097, -239.38467, -232.92122]
2025-08-07 09:32:36,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:32:36,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 59 minutes, 55 seconds)
2025-08-07 09:34:17,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:29,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -147.92287 ± 88.909
2025-08-07 09:34:29,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-64.319916, -31.779736, -194.67007, -91.289215, -220.96219, -170.0535, -285.13937, -261.38956, -136.24982, -23.375343]
2025-08-07 09:34:29,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:34:29,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-147.92) for latency MM1Queue_a033_s075
2025-08-07 09:34:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 58 minutes, 49 seconds)
2025-08-07 09:36:11,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:22,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -139.36929 ± 86.655
2025-08-07 09:36:22,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-168.62799, -209.706, -224.8543, -162.49207, -183.65794, -219.57729, -83.308556, 71.60443, -146.79755, -66.27565]
2025-08-07 09:36:22,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:36:22,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-139.37) for latency MM1Queue_a033_s075
2025-08-07 09:36:22,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 57 minutes, 23 seconds)
2025-08-07 09:38:04,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -120.75533 ± 123.581
2025-08-07 09:38:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-53.430035, -254.39987, -98.730064, 132.9582, -243.0721, -288.62015, -81.33874, -162.66228, -167.32646, 9.067958]
2025-08-07 09:38:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:38:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-120.76) for latency MM1Queue_a033_s075
2025-08-07 09:38:15,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 57 minutes, 13 seconds)
2025-08-07 09:39:57,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:40:08,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -113.53662 ± 45.339
2025-08-07 09:40:08,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-117.042595, -171.46875, -68.57161, -123.24947, -149.27786, -116.459656, -86.41264, -192.5116, -53.944252, -56.427887]
2025-08-07 09:40:08,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:40:08,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-113.54) for latency MM1Queue_a033_s075
2025-08-07 09:40:08,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 55 minutes, 18 seconds)
2025-08-07 09:41:50,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:42:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -70.68611 ± 60.898
2025-08-07 09:42:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-86.24606, -0.9524303, -95.0567, 2.3946989, -150.32808, -95.48682, -39.41595, 23.400936, -104.341484, -160.82924]
2025-08-07 09:42:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:42:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-70.69) for latency MM1Queue_a033_s075
2025-08-07 09:42:01,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 53 minutes, 24 seconds)
2025-08-07 09:43:43,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:54,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 63.84240 ± 116.336
2025-08-07 09:43:54,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [95.735596, 278.59457, 40.60384, 3.518274, -43.51106, -147.76349, 54.46937, 51.014828, 229.09262, 76.66942]
2025-08-07 09:43:54,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:43:54,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (63.84) for latency MM1Queue_a033_s075
2025-08-07 09:43:54,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 51 minutes, 28 seconds)
2025-08-07 09:45:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 143.91843 ± 161.698
2025-08-07 09:45:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [233.91736, 201.66705, 213.23053, -324.76578, 211.82051, 203.34946, 241.28214, 222.27466, 102.79187, 133.61644]
2025-08-07 09:45:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:45:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (143.92) for latency MM1Queue_a033_s075
2025-08-07 09:45:48,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 49 minutes, 36 seconds)
2025-08-07 09:47:29,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:47:41,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 150.76953 ± 200.634
2025-08-07 09:47:41,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [262.06747, 426.86374, 169.1956, 158.05794, -271.63904, -98.80577, 356.34088, 196.74483, 273.62692, 35.24267]
2025-08-07 09:47:41,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:47:41,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (150.77) for latency MM1Queue_a033_s075
2025-08-07 09:47:41,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 47 minutes, 43 seconds)
2025-08-07 09:49:22,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:49:34,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 160.89346 ± 240.839
2025-08-07 09:49:34,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-285.5635, 409.89587, 263.0032, 257.40408, -324.60263, 271.59485, 229.91183, 187.03973, 241.86995, 358.38138]
2025-08-07 09:49:34,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:49:34,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (160.89) for latency MM1Queue_a033_s075
2025-08-07 09:49:34,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 45 minutes, 49 seconds)
2025-08-07 09:51:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:51:27,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 315.47656 ± 101.778
2025-08-07 09:51:27,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [451.8883, 166.3176, 122.48436, 429.54767, 316.39142, 371.1124, 256.63486, 355.66956, 380.13684, 304.58243]
2025-08-07 09:51:27,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:51:27,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (315.48) for latency MM1Queue_a033_s075
2025-08-07 09:51:27,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 43 minutes, 56 seconds)
2025-08-07 09:53:08,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:53:20,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 296.20325 ± 184.928
2025-08-07 09:53:20,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [270.3423, 446.91653, 297.17395, 366.14206, 308.16248, 421.67078, 437.90573, -228.60461, 344.33902, 297.98404]
2025-08-07 09:53:20,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:53:20,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 42 minutes, 6 seconds)
2025-08-07 09:55:01,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:55:13,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 372.41583 ± 117.828
2025-08-07 09:55:13,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [515.61, 246.86093, 435.82626, 451.2928, 427.66818, 377.16525, 421.33694, 157.99023, 485.55188, 204.85626]
2025-08-07 09:55:13,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:55:13,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (372.42) for latency MM1Queue_a033_s075
2025-08-07 09:55:13,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 40 minutes, 11 seconds)
2025-08-07 09:56:54,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:06,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 191.25427 ± 241.862
2025-08-07 09:57:06,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [292.3506, 403.95566, -199.13861, 289.72717, 528.9604, 174.08524, 138.44405, -250.31615, 119.34123, 415.13315]
2025-08-07 09:57:06,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:57:06,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 38 minutes, 18 seconds)
2025-08-07 09:58:47,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:59,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 435.46567 ± 194.083
2025-08-07 09:58:59,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [47.282467, 512.2392, 456.79977, 246.5769, 570.23126, 679.82684, 477.7026, 716.5447, 336.6261, 310.82657]
2025-08-07 09:58:59,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:58:59,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (435.47) for latency MM1Queue_a033_s075
2025-08-07 09:58:59,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-08-07 10:00:40,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:52,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 424.66928 ± 158.767
2025-08-07 10:00:52,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [613.3658, 529.8304, 301.4797, 438.21057, 146.62335, 614.3553, 423.00427, 172.84793, 524.85486, 482.12048]
2025-08-07 10:00:52,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:00:52,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 34 minutes, 30 seconds)
2025-08-07 10:02:33,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:45,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 508.55380 ± 161.831
2025-08-07 10:02:45,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [553.4526, 505.08136, 533.9155, 593.30206, 623.04974, 601.82996, 42.881145, 465.56985, 596.96844, 569.48773]
2025-08-07 10:02:45,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:02:45,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (508.55) for latency MM1Queue_a033_s075
2025-08-07 10:02:45,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 32 minutes, 34 seconds)
2025-08-07 10:04:26,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:04:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 373.93246 ± 226.765
2025-08-07 10:04:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [328.10065, 506.63385, 1.238352, 571.9517, 449.30084, -112.551605, 412.6311, 492.8437, 535.34546, 553.8308]
2025-08-07 10:04:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:04:38,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 30 minutes, 41 seconds)
2025-08-07 10:06:19,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 399.64865 ± 214.015
2025-08-07 10:06:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [447.3755, 474.77618, 351.72958, 559.50446, -110.80785, 617.0996, 523.7157, 137.06184, 571.1749, 424.85672]
2025-08-07 10:06:31,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:06:31,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 28 minutes, 48 seconds)
2025-08-07 10:08:13,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:24,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 540.30676 ± 47.050
2025-08-07 10:08:24,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [549.0717, 570.55206, 460.02164, 607.17163, 525.3424, 552.0007, 618.5218, 494.23737, 521.59875, 504.55002]
2025-08-07 10:08:24,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:08:24,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (540.31) for latency MM1Queue_a033_s075
2025-08-07 10:08:24,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 27 minutes)
2025-08-07 10:10:06,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:17,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 525.35602 ± 135.801
2025-08-07 10:10:17,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [402.8168, 808.81964, 599.8455, 476.8709, 388.25952, 503.4152, 334.28214, 645.3657, 617.7644, 476.1196]
2025-08-07 10:10:17,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:10:17,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 25 minutes, 8 seconds)
2025-08-07 10:11:59,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:10,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 507.21893 ± 92.086
2025-08-07 10:12:10,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [413.86606, 581.2368, 574.401, 443.6125, 581.05615, 289.89774, 581.7772, 511.77652, 564.2499, 530.3155]
2025-08-07 10:12:10,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:12:10,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 23 minutes, 15 seconds)
2025-08-07 10:13:52,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:03,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 366.63483 ± 255.085
2025-08-07 10:14:03,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-112.80991, 444.92416, 248.37202, 421.98798, 496.94327, -93.104675, 594.98236, 496.44962, 636.9612, 531.64197]
2025-08-07 10:14:03,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:14:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 21 minutes, 20 seconds)
2025-08-07 10:15:45,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:57,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 586.68091 ± 176.351
2025-08-07 10:15:57,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [767.19336, 491.5656, 542.35065, 703.71454, 535.4068, 163.25644, 773.9479, 630.2355, 504.78622, 754.35223]
2025-08-07 10:15:57,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:15:57,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (586.68) for latency MM1Queue_a033_s075
2025-08-07 10:15:57,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 19 minutes, 29 seconds)
2025-08-07 10:17:38,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:50,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 610.22467 ± 103.287
2025-08-07 10:17:50,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [771.4259, 705.1329, 556.9397, 609.4593, 443.3221, 621.25757, 519.3862, 610.616, 506.5776, 758.129]
2025-08-07 10:17:50,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:17:50,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (610.22) for latency MM1Queue_a033_s075
2025-08-07 10:17:50,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 17 minutes, 30 seconds)
2025-08-07 10:19:31,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:43,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 574.98297 ± 162.701
2025-08-07 10:19:43,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [542.89716, 574.747, 681.85236, 328.8043, 353.66705, 559.3012, 657.8164, 939.41376, 523.9131, 587.41766]
2025-08-07 10:19:43,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:19:43,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 15 minutes, 37 seconds)
2025-08-07 10:21:24,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:35,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 503.44305 ± 94.751
2025-08-07 10:21:35,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [351.48868, 608.7167, 439.39093, 458.7752, 641.11456, 379.6324, 616.1105, 516.8148, 477.78543, 544.60126]
2025-08-07 10:21:35,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:21:36,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 13 minutes, 44 seconds)
2025-08-07 10:23:17,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:28,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 524.01166 ± 79.819
2025-08-07 10:23:28,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [445.57132, 464.1468, 511.61304, 642.97205, 557.7176, 386.58044, 654.79395, 484.96472, 551.35986, 540.39685]
2025-08-07 10:23:28,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:23:28,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-08-07 10:25:10,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:21,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 700.34973 ± 134.976
2025-08-07 10:25:21,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [865.1828, 718.97327, 852.9147, 654.88794, 777.4708, 487.45755, 449.139, 694.14044, 682.2322, 821.0984]
2025-08-07 10:25:21,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:25:21,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (700.35) for latency MM1Queue_a033_s075
2025-08-07 10:25:21,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 9 minutes, 52 seconds)
2025-08-07 10:27:02,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:14,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 654.34290 ± 51.806
2025-08-07 10:27:14,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [592.71277, 609.505, 652.78925, 704.32135, 577.9588, 719.29877, 642.16345, 745.24677, 652.9635, 646.4696]
2025-08-07 10:27:14,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:27:14,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 7 minutes, 59 seconds)
2025-08-07 10:28:55,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:07,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 629.84998 ± 182.488
2025-08-07 10:29:07,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [257.29086, 797.4039, 634.0733, 846.2313, 707.04663, 653.88025, 386.81976, 851.10297, 593.1927, 571.45764]
2025-08-07 10:29:07,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:29:07,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 6 minutes, 7 seconds)
2025-08-07 10:30:49,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:00,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 626.98053 ± 229.572
2025-08-07 10:31:00,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [741.45135, 599.57135, 822.7853, 691.708, 632.5548, 583.0471, 680.2045, 865.20685, 664.7888, -11.513354]
2025-08-07 10:31:00,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:31:00,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 4 minutes, 16 seconds)
2025-08-07 10:32:42,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:53,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 571.53619 ± 205.078
2025-08-07 10:32:53,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [509.67682, 456.0259, 789.14056, 369.19626, 106.32416, 789.86115, 744.29114, 610.9792, 681.20917, 658.6578]
2025-08-07 10:32:53,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:32:53,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 25 seconds)
2025-08-07 10:34:35,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 781.45496 ± 85.959
2025-08-07 10:34:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [803.9864, 693.1777, 890.3282, 703.40173, 701.7942, 970.0051, 746.7739, 776.5109, 808.6087, 719.96216]
2025-08-07 10:34:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:34:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (781.45) for latency MM1Queue_a033_s075
2025-08-07 10:34:46,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 33 seconds)
2025-08-07 10:36:27,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:38,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 600.28650 ± 197.912
2025-08-07 10:36:38,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [306.47427, 552.843, 744.4953, 877.1969, 632.7103, 195.96231, 730.0744, 656.3375, 565.3933, 741.3778]
2025-08-07 10:36:38,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:36:39,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 58 minutes, 30 seconds)
2025-08-07 10:38:18,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:30,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 873.54114 ± 244.304
2025-08-07 10:38:30,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [535.3914, 809.3116, 793.4381, 1341.4647, 538.5384, 810.8267, 731.5293, 1062.7046, 1140.4785, 971.7277]
2025-08-07 10:38:30,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:38:30,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (873.54) for latency MM1Queue_a033_s075
2025-08-07 10:38:30,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 56 minutes, 17 seconds)
2025-08-07 10:40:10,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:21,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 614.00848 ± 166.832
2025-08-07 10:40:21,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [475.77023, 563.9064, 775.0832, 784.2348, 637.7129, 507.6164, 778.9688, 720.7807, 667.29865, 228.71288]
2025-08-07 10:40:21,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:40:21,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 54 minutes, 4 seconds)
2025-08-07 10:42:01,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:12,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 654.19519 ± 214.403
2025-08-07 10:42:12,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [918.82227, 102.81061, 726.2142, 677.7348, 693.12897, 788.49335, 649.9342, 546.21106, 861.04724, 577.5552]
2025-08-07 10:42:12,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:42:12,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 51 minutes, 46 seconds)
2025-08-07 10:43:52,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:03,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 778.90283 ± 291.194
2025-08-07 10:44:03,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [426.17145, 434.7429, 971.47955, 1020.78687, 908.1438, 519.04816, 356.7887, 1155.2073, 1066.0507, 930.609]
2025-08-07 10:44:03,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:44:03,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 49 minutes, 29 seconds)
2025-08-07 10:45:42,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:54,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 586.79633 ± 235.109
2025-08-07 10:45:54,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [822.33795, 157.81537, 768.1195, 707.6116, 120.1758, 592.3031, 724.5938, 742.07104, 576.7158, 656.21985]
2025-08-07 10:45:54,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:45:54,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 47 minutes, 22 seconds)
2025-08-07 10:47:33,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:45,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 652.72394 ± 204.239
2025-08-07 10:47:45,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [264.25534, 635.42975, 723.7054, 735.2745, 976.8169, 409.4965, 687.618, 543.1241, 623.4826, 928.036]
2025-08-07 10:47:45,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:47:45,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 45 minutes, 22 seconds)
2025-08-07 10:49:24,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:35,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 754.63171 ± 60.015
2025-08-07 10:49:35,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [693.14087, 757.9393, 655.766, 880.8116, 752.2669, 783.0273, 820.1894, 728.9406, 728.53265, 745.7029]
2025-08-07 10:49:35,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:49:35,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 43 minutes, 25 seconds)
2025-08-07 10:51:15,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:26,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 868.56708 ± 218.041
2025-08-07 10:51:26,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [806.04114, 548.7192, 1041.7975, 1081.3324, 708.6189, 1052.8578, 965.8054, 1170.8505, 509.60425, 800.0444]
2025-08-07 10:51:26,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:51:26,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 41 minutes, 33 seconds)
2025-08-07 10:53:06,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:17,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 831.09491 ± 161.992
2025-08-07 10:53:17,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [691.9661, 884.67267, 866.9868, 978.6411, 949.98785, 589.09045, 1029.0012, 852.9211, 941.04675, 526.6351]
2025-08-07 10:53:17,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:53:17,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 39 minutes, 42 seconds)
2025-08-07 10:54:56,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:08,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 963.51593 ± 153.653
2025-08-07 10:55:08,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [677.8755, 1125.255, 943.41675, 1024.7622, 892.1853, 955.7541, 1273.0138, 815.5247, 989.6666, 937.70526]
2025-08-07 10:55:08,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:55:08,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (963.52) for latency MM1Queue_a033_s075
2025-08-07 10:55:08,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 37 minutes, 52 seconds)
2025-08-07 10:56:47,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:58,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 853.79578 ± 145.326
2025-08-07 10:56:58,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1142.4521, 673.758, 932.12573, 793.66754, 607.8103, 862.8602, 888.3238, 823.69385, 813.2583, 1000.00793]
2025-08-07 10:56:58,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:56:58,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 35 minutes, 54 seconds)
2025-08-07 10:58:37,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:48,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 865.12219 ± 241.288
2025-08-07 10:58:48,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [780.6918, 689.1541, 660.54016, 1336.359, 441.8056, 936.9141, 1046.0979, 1065.6765, 952.3617, 741.6208]
2025-08-07 10:58:48,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:58:48,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 33 minutes, 55 seconds)
2025-08-07 11:00:26,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:38,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 862.71155 ± 129.185
2025-08-07 11:00:38,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [834.763, 854.31134, 838.9503, 722.60535, 983.9571, 1115.3374, 880.64984, 968.7264, 638.15765, 789.6571]
2025-08-07 11:00:38,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:00:38,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 31 minutes, 56 seconds)
2025-08-07 11:02:16,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:28,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 821.07507 ± 243.186
2025-08-07 11:02:28,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [943.41516, 770.18744, 786.9505, 538.1764, 1066.6934, 1182.2877, 948.01953, 291.6069, 794.07263, 889.34174]
2025-08-07 11:02:28,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:02:28,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 29 minutes, 57 seconds)
2025-08-07 11:04:06,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:18,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 663.98181 ± 232.088
2025-08-07 11:04:18,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [286.5931, 341.5672, 599.3668, 630.4507, 992.24835, 707.84686, 604.3513, 574.8953, 970.699, 931.7995]
2025-08-07 11:04:18,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:04:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 27 minutes, 59 seconds)
2025-08-07 11:05:56,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:08,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 821.33948 ± 144.952
2025-08-07 11:06:08,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [839.4062, 702.20135, 611.00714, 709.7782, 1028.9023, 966.4431, 809.68774, 690.79236, 1059.8988, 795.277]
2025-08-07 11:06:08,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:06:08,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 26 minutes, 8 seconds)
2025-08-07 11:07:46,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 952.26984 ± 129.589
2025-08-07 11:07:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [905.4235, 860.3417, 1019.6024, 854.8475, 1040.0516, 930.69904, 1078.4061, 944.63275, 700.42224, 1188.2712]
2025-08-07 11:07:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:07:58,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 24 minutes, 18 seconds)
2025-08-07 11:09:36,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1013.73749 ± 169.609
2025-08-07 11:09:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [935.7035, 1181.6261, 650.33185, 925.9238, 1057.8423, 996.4116, 1117.8993, 1000.1126, 1323.5934, 947.9305]
2025-08-07 11:09:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:09:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1013.74) for latency MM1Queue_a033_s075
2025-08-07 11:09:48,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 22 minutes, 30 seconds)
2025-08-07 11:11:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:38,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 765.86584 ± 271.950
2025-08-07 11:11:38,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [953.2647, 983.45, 893.758, 959.13336, 304.27332, 685.4132, 819.4939, 1013.642, 205.51863, 840.71155]
2025-08-07 11:11:38,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:11:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 20 minutes, 40 seconds)
2025-08-07 11:13:17,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1178.51965 ± 142.976
2025-08-07 11:13:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1278.8075, 1295.468, 1154.9048, 1306.4972, 1356.0426, 1285.1133, 912.3214, 1052.6266, 1147.0277, 996.38745]
2025-08-07 11:13:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:13:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1178.52) for latency MM1Queue_a033_s075
2025-08-07 11:13:28,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 18 minutes, 51 seconds)
2025-08-07 11:15:07,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:18,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1023.31360 ± 156.464
2025-08-07 11:15:18,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1097.6307, 878.23566, 1004.6535, 951.7578, 810.4977, 1193.5862, 1278.8395, 816.1575, 1196.3966, 1005.37994]
2025-08-07 11:15:18,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:15:18,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 17 minutes, 1 second)
2025-08-07 11:16:57,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:08,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 970.30530 ± 134.913
2025-08-07 11:17:08,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1120.5471, 976.96545, 636.42645, 854.3073, 990.7421, 1114.2935, 922.73346, 1030.6892, 1047.0148, 1009.3329]
2025-08-07 11:17:08,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:17:08,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 15 minutes, 12 seconds)
2025-08-07 11:18:47,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:58,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 980.03748 ± 207.318
2025-08-07 11:18:58,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [742.1216, 1121.5555, 1147.1066, 795.7434, 1064.4987, 1408.7926, 976.2846, 685.5314, 864.3613, 994.37933]
2025-08-07 11:18:58,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:18:58,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 13 minutes, 22 seconds)
2025-08-07 11:20:37,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:48,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1064.19751 ± 136.010
2025-08-07 11:20:48,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1087.7843, 1271.3921, 1115.1775, 1192.3132, 1043.1213, 970.87775, 992.12286, 1142.1874, 1081.7826, 745.2168]
2025-08-07 11:20:48,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:20:48,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 11 minutes, 31 seconds)
2025-08-07 11:22:27,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:38,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 885.40167 ± 216.620
2025-08-07 11:22:38,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [922.565, 1171.4315, 591.1637, 1006.46313, 938.01086, 886.4295, 903.1703, 906.7685, 409.18167, 1118.8324]
2025-08-07 11:22:38,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:22:38,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 9 minutes, 40 seconds)
2025-08-07 11:24:17,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:28,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1037.21265 ± 192.845
2025-08-07 11:24:28,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [839.61633, 1025.5579, 840.96246, 1281.8262, 993.5616, 1036.5116, 1361.4186, 1273.0898, 793.64215, 925.9395]
2025-08-07 11:24:28,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:24:28,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 7 minutes, 51 seconds)
2025-08-07 11:26:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:18,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 898.00116 ± 231.984
2025-08-07 11:26:18,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [653.55273, 810.2585, 1243.3707, 781.22754, 1117.3074, 1027.7834, 437.7567, 900.85754, 865.0746, 1142.8225]
2025-08-07 11:26:18,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:26:18,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 5 minutes, 59 seconds)
2025-08-07 11:27:57,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:08,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1017.16681 ± 139.069
2025-08-07 11:28:08,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [988.9214, 1077.9626, 972.55334, 1051.0012, 798.4679, 1083.7357, 953.01276, 1163.8093, 1272.1294, 810.0738]
2025-08-07 11:28:08,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:28:08,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 4 minutes, 9 seconds)
2025-08-07 11:29:47,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:58,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1072.15454 ± 198.449
2025-08-07 11:29:58,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1195.8607, 1295.0098, 1182.3291, 685.7448, 997.35645, 884.517, 846.1113, 1114.0022, 1298.7036, 1221.9114]
2025-08-07 11:29:58,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:29:58,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 2 minutes, 19 seconds)
2025-08-07 11:31:37,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:48,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1020.31750 ± 113.736
2025-08-07 11:31:48,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1014.33203, 907.8287, 1255.3579, 921.495, 1109.8838, 981.0244, 1123.4928, 997.88745, 1045.6333, 846.2394]
2025-08-07 11:31:48,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:31:48,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 28 seconds)
2025-08-07 11:33:26,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:38,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1066.95557 ± 194.019
2025-08-07 11:33:38,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1179.3949, 1196.9711, 994.04913, 1005.6017, 535.24493, 1133.4371, 1089.1412, 1181.29, 1257.1873, 1097.2379]
2025-08-07 11:33:38,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:33:38,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 58 minutes, 37 seconds)
2025-08-07 11:35:16,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:28,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1006.59113 ± 97.693
2025-08-07 11:35:28,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1162.7637, 1003.53253, 928.99713, 1075.3912, 1053.7118, 891.533, 953.9889, 1164.9382, 901.35986, 929.6941]
2025-08-07 11:35:28,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:35:28,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 56 minutes, 48 seconds)
2025-08-07 11:37:06,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:18,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 989.02734 ± 166.494
2025-08-07 11:37:18,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1193.1725, 765.06854, 897.1176, 967.6555, 766.8482, 1048.3842, 846.14966, 1022.0209, 1092.5872, 1291.2692]
2025-08-07 11:37:18,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:37:18,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 54 minutes, 58 seconds)
2025-08-07 11:38:56,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:08,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1079.45776 ± 135.099
2025-08-07 11:39:08,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1169.2933, 947.16064, 887.75977, 1234.6704, 1191.2456, 1278.2578, 945.87573, 1085.2135, 1129.5803, 925.5202]
2025-08-07 11:39:08,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:39:08,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 8 seconds)
2025-08-07 11:40:46,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:58,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1142.91724 ± 117.885
2025-08-07 11:40:58,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1314.5875, 1213.584, 1068.0867, 1082.1425, 869.5947, 1187.2318, 1277.296, 1154.9005, 1135.9136, 1125.834]
2025-08-07 11:40:58,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:40:58,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 51 minutes, 18 seconds)
2025-08-07 11:42:36,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:48,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1162.78455 ± 157.543
2025-08-07 11:42:48,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1085.1869, 1458.8373, 1398.5618, 1068.76, 1142.0569, 1144.7949, 1248.1401, 951.31555, 1160.63, 969.56287]
2025-08-07 11:42:48,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:42:48,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 49 minutes, 30 seconds)
2025-08-07 11:44:26,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:38,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1164.22437 ± 161.842
2025-08-07 11:44:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1387.2053, 993.67645, 892.9511, 945.9051, 1341.1068, 1262.5094, 1128.2108, 1283.616, 1169.7605, 1237.3019]
2025-08-07 11:44:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:44:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 41 seconds)
2025-08-07 11:46:16,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:28,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1054.16528 ± 149.273
2025-08-07 11:46:28,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [731.3509, 1055.148, 1139.5178, 1272.8926, 1088.9288, 1212.2902, 907.4839, 1140.2117, 966.9134, 1026.9148]
2025-08-07 11:46:28,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:46:28,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 50 seconds)
2025-08-07 11:48:06,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:18,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1181.28333 ± 133.842
2025-08-07 11:48:18,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1114.9626, 951.0468, 1230.7435, 1034.6006, 1202.1354, 1387.4915, 1268.3167, 1095.5111, 1149.5778, 1378.4459]
2025-08-07 11:48:18,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:48:18,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1181.28) for latency MM1Queue_a033_s075
2025-08-07 11:48:18,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 44 minutes)
2025-08-07 11:49:56,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:08,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1044.74060 ± 154.225
2025-08-07 11:50:08,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [923.5061, 1303.1144, 1083.8187, 976.46454, 1014.39966, 1270.5107, 751.1971, 1061.8547, 1107.0795, 955.4622]
2025-08-07 11:50:08,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:50:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 42 minutes, 10 seconds)
2025-08-07 11:51:46,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1236.14575 ± 160.091
2025-08-07 11:51:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1471.5226, 1261.1826, 1117.3436, 1037.3671, 1252.9497, 1033.0942, 1418.5734, 1045.5963, 1317.7617, 1406.0657]
2025-08-07 11:51:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:51:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1236.15) for latency MM1Queue_a033_s075
2025-08-07 11:51:58,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 40 minutes, 18 seconds)
2025-08-07 11:53:36,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:48,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1112.19690 ± 204.755
2025-08-07 11:53:48,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1017.4914, 1409.7252, 1121.2853, 884.40674, 1192.665, 1161.9844, 832.471, 1506.4021, 1020.0777, 975.45874]
2025-08-07 11:53:48,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:53:48,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 28 seconds)
2025-08-07 11:55:26,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:38,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1153.67444 ± 222.089
2025-08-07 11:55:38,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1197.6532, 1271.1072, 1265.4049, 1054.327, 1269.7314, 1260.2936, 1457.0874, 985.0358, 597.6112, 1178.4923]
2025-08-07 11:55:38,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:55:38,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 38 seconds)
2025-08-07 11:57:16,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1118.76929 ± 87.317
2025-08-07 11:57:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1179.15, 1093.555, 1036.8108, 951.5225, 1101.8649, 1123.265, 1066.9396, 1251.7145, 1136.4559, 1246.414]
2025-08-07 11:57:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:57:27,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 48 seconds)
2025-08-07 11:59:06,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:17,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1317.17700 ± 127.888
2025-08-07 11:59:17,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1218.6436, 1546.329, 1310.7783, 1265.6288, 1243.6428, 1218.9249, 1392.05, 1514.5632, 1121.0161, 1340.1942]
2025-08-07 11:59:17,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:59:17,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1317.18) for latency MM1Queue_a033_s075
2025-08-07 11:59:17,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 58 seconds)
2025-08-07 12:00:56,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1151.42883 ± 92.982
2025-08-07 12:01:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1034.5493, 1249.1892, 1082.0686, 1165.0249, 1099.366, 1258.0273, 1071.7218, 1314.4248, 1051.8844, 1188.0315]
2025-08-07 12:01:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:01:07,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 8 seconds)
2025-08-07 12:02:46,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1270.43323 ± 128.729
2025-08-07 12:02:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1047.7153, 1267.8005, 1227.6361, 1138.6007, 1210.7214, 1176.372, 1422.6267, 1375.4031, 1474.9142, 1362.5421]
2025-08-07 12:02:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:02:57,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 18 seconds)
2025-08-07 12:04:36,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:47,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1138.68237 ± 158.510
2025-08-07 12:04:47,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1449.3524, 1082.7521, 1335.1903, 1058.1995, 953.39795, 1133.3177, 1028.2701, 1310.3772, 1043.9945, 991.972]
2025-08-07 12:04:47,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:04:47,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 29 seconds)
2025-08-07 12:06:26,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:37,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1193.23657 ± 104.492
2025-08-07 12:06:37,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1229.201, 998.47614, 1092.9451, 1146.9316, 1270.9783, 1106.5557, 1154.8383, 1313.8973, 1284.3688, 1334.1727]
2025-08-07 12:06:37,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:06:37,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 39 seconds)
2025-08-07 12:08:16,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:27,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1307.94507 ± 166.647
2025-08-07 12:08:27,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1187.6511, 1272.9229, 1631.3911, 1077.0872, 1435.9832, 1344.3623, 1380.4337, 1449.1085, 1074.5267, 1225.984]
2025-08-07 12:08:27,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:08:27,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 49 seconds)
2025-08-07 12:10:06,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:17,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1327.40137 ± 147.713
2025-08-07 12:10:17,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1257.8381, 1203.8423, 1152.0743, 1382.5773, 1684.1078, 1319.9801, 1195.0967, 1263.0347, 1372.6986, 1442.7643]
2025-08-07 12:10:17,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:10:17,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1327.40) for latency MM1Queue_a033_s075
2025-08-07 12:10:17,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes)
2025-08-07 12:11:56,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:07,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1290.98462 ± 153.849
2025-08-07 12:12:07,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1025.8624, 1185.0726, 1308.9221, 1396.7578, 1192.2239, 1533.911, 1403.0148, 1113.4705, 1284.1232, 1466.488]
2025-08-07 12:12:07,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:12:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 10 seconds)
2025-08-07 12:13:46,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:57,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1195.89648 ± 172.463
2025-08-07 12:13:57,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1191.4559, 1185.3624, 1362.6465, 1146.2009, 1188.6163, 1429.1013, 1023.92523, 1056.5605, 901.65985, 1473.435]
2025-08-07 12:13:57,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:13:57,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 19 seconds)
2025-08-07 12:15:36,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:47,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1319.41528 ± 153.531
2025-08-07 12:15:47,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1314.8412, 1319.6915, 1144.4408, 1448.6088, 1583.1406, 1208.0101, 1074.6849, 1323.0281, 1524.643, 1253.0651]
2025-08-07 12:15:47,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:15:47,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 29 seconds)
2025-08-07 12:17:26,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:37,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1235.86914 ± 162.765
2025-08-07 12:17:37,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1135.2738, 1192.2235, 1525.2283, 1450.9083, 1216.6422, 1360.4032, 1094.1705, 1227.2673, 1214.7345, 941.84015]
2025-08-07 12:17:37,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:17:37,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 39 seconds)
2025-08-07 12:19:16,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1323.58447 ± 157.487
2025-08-07 12:19:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1217.9777, 1516.111, 1169.2303, 1218.8357, 1409.5098, 1266.0336, 1212.0292, 1365.8862, 1187.8582, 1672.3727]
2025-08-07 12:19:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:19:27,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 49 seconds)
2025-08-07 12:21:06,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:17,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1439.20007 ± 86.661
2025-08-07 12:21:17,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1579.2346, 1373.1028, 1502.1761, 1420.2848, 1245.6083, 1449.2411, 1445.4353, 1527.6871, 1446.1271, 1403.1046]
2025-08-07 12:21:17,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:21:17,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1439.20) for latency MM1Queue_a033_s075
2025-08-07 12:21:17,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 59 seconds)
2025-08-07 12:22:56,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:07,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1209.93750 ± 195.502
2025-08-07 12:23:07,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1169.8087, 1388.3927, 1219.2018, 1421.921, 1342.7103, 1278.3191, 1051.1608, 1236.4128, 709.5944, 1281.8522]
2025-08-07 12:23:07,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:23:07,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 10 seconds)
2025-08-07 12:24:46,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:57,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1292.50183 ± 116.917
2025-08-07 12:24:57,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1210.5576, 1341.4669, 1459.1736, 1428.4607, 1276.4955, 1195.9005, 1145.5679, 1286.5414, 1132.1603, 1448.6947]
2025-08-07 12:24:57,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:24:57,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 20 seconds)
2025-08-07 12:26:36,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1310.77893 ± 129.921
2025-08-07 12:26:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1346.501, 1238.7527, 1281.3141, 1657.9001, 1374.8395, 1234.9515, 1184.5369, 1281.0214, 1191.3713, 1316.6012]
2025-08-07 12:26:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:26:47,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 30 seconds)
2025-08-07 12:28:26,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:38,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1462.25146 ± 70.683
2025-08-07 12:28:38,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1470.5806, 1573.5062, 1521.7411, 1452.0441, 1550.7336, 1498.7268, 1397.3842, 1349.1871, 1426.7191, 1381.8915]
2025-08-07 12:28:38,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:28:38,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1462.25) for latency MM1Queue_a033_s075
2025-08-07 12:28:38,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 40 seconds)
2025-08-07 12:30:16,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:28,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1386.72607 ± 134.668
2025-08-07 12:30:28,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1661.2357, 1607.2838, 1343.5527, 1276.208, 1288.28, 1334.7625, 1400.7715, 1247.1631, 1286.6682, 1421.335]
2025-08-07 12:30:28,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:30:28,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 50 seconds)
2025-08-07 12:32:05,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:17,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1574.84277 ± 167.536
2025-08-07 12:32:17,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1583.1687, 1250.8809, 1640.5173, 1697.2839, 1335.7742, 1734.8032, 1787.678, 1700.0337, 1553.9365, 1464.3518]
2025-08-07 12:32:17,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:32:17,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1574.84) for latency MM1Queue_a033_s075
2025-08-07 12:32:17,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1251 [DEBUG]: Training session finished
