2025-08-07 10:17:35,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:17:35,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:17:35,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14f5fb18fd10>}
2025-08-07 10:17:35,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 10:17:35,981 baseline-bpql-noiseperc0-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:17:35,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 10:17:35,997 baseline-bpql-noiseperc0-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 10:17:35,997 baseline-bpql-noiseperc0-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:17:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 10:17:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 10:19:07,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:07,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 53.12982 ± 0.696
2025-08-07 10:19:07,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [53.145977, 52.91557, 54.0256, 53.80074, 53.909943, 53.37683, 52.550293, 52.691597, 51.604607, 53.277096]
2025-08-07 10:19:07,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [45.0, 44.0, 45.0, 45.0, 45.0, 45.0, 44.0, 44.0, 43.0, 45.0]
2025-08-07 10:19:07,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (53.13) for latency MM1Queue_a033_s075
2025-08-07 10:19:07,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 30 minutes, 2 seconds)
2025-08-07 10:20:46,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:48,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 169.88773 ± 77.698
2025-08-07 10:20:48,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [227.931, 324.0377, 172.28781, 37.58519, 163.43852, 74.136894, 183.32152, 172.41289, 228.09825, 115.62755]
2025-08-07 10:20:48,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 273.0, 122.0, 153.0, 111.0, 280.0, 129.0, 120.0, 136.0, 150.0]
2025-08-07 10:20:48,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (169.89) for latency MM1Queue_a033_s075
2025-08-07 10:20:48,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 35 minutes, 58 seconds)
2025-08-07 10:22:26,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:28,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 49.86303 ± 27.190
2025-08-07 10:22:28,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [87.63172, 22.991676, 28.133184, 58.379356, 68.91019, 26.866116, 92.45076, 7.1103535, 44.533257, 61.623615]
2025-08-07 10:22:28,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 165.0, 208.0, 88.0, 280.0, 154.0, 143.0, 194.0, 169.0, 256.0]
2025-08-07 10:22:28,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 37 minutes, 5 seconds)
2025-08-07 10:24:06,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:08,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 87.06059 ± 75.391
2025-08-07 10:24:08,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [111.296776, 83.626236, 290.0378, 100.88882, 61.933174, 6.6346774, 49.999706, 8.040375, 80.3829, 77.76545]
2025-08-07 10:24:08,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 93.0, 215.0, 194.0, 100.0, 169.0, 168.0, 190.0, 92.0, 92.0]
2025-08-07 10:24:08,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 36 minutes, 33 seconds)
2025-08-07 10:25:46,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:47,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 183.89464 ± 27.363
2025-08-07 10:25:47,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [194.9944, 199.51718, 158.52776, 146.9925, 210.67389, 205.98161, 168.4071, 143.72264, 229.14548, 180.98381]
2025-08-07 10:25:47,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 126.0, 108.0, 109.0, 136.0, 139.0, 107.0, 120.0, 143.0, 122.0]
2025-08-07 10:25:47,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (183.89) for latency MM1Queue_a033_s075
2025-08-07 10:25:47,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 35 minutes, 26 seconds)
2025-08-07 10:27:25,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:26,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 83.91164 ± 46.486
2025-08-07 10:27:26,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [54.809895, 51.55263, 54.074337, 52.823124, 49.99314, 166.28755, 52.450043, 139.29811, 60.87782, 156.94975]
2025-08-07 10:27:26,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [56.0, 48.0, 49.0, 48.0, 48.0, 122.0, 48.0, 130.0, 56.0, 128.0]
2025-08-07 10:27:26,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 5 seconds)
2025-08-07 10:29:04,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:05,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 202.89072 ± 45.294
2025-08-07 10:29:05,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [218.84094, 251.9264, 196.10144, 245.98355, 173.50238, 277.28162, 197.57898, 153.15108, 119.249725, 195.291]
2025-08-07 10:29:05,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 159.0, 123.0, 154.0, 171.0, 162.0, 131.0, 103.0, 112.0, 131.0]
2025-08-07 10:29:05,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (202.89) for latency MM1Queue_a033_s075
2025-08-07 10:29:05,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 34 minutes, 21 seconds)
2025-08-07 10:30:44,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:45,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 166.75601 ± 8.147
2025-08-07 10:30:45,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [151.7321, 169.6019, 177.66197, 174.08812, 170.13075, 166.12926, 166.20903, 170.58229, 151.83867, 169.58592]
2025-08-07 10:30:45,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 111.0, 113.0, 121.0, 113.0, 118.0, 103.0, 113.0, 117.0, 117.0]
2025-08-07 10:30:45,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 32 minutes, 27 seconds)
2025-08-07 10:32:22,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:23,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 106.15354 ± 102.055
2025-08-07 10:32:23,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [67.98483, 409.4573, 66.938034, 55.866646, 81.47804, 65.261314, 86.22184, 69.37272, 55.181725, 103.77295]
2025-08-07 10:32:23,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [57.0, 237.0, 57.0, 55.0, 57.0, 58.0, 58.0, 61.0, 56.0, 62.0]
2025-08-07 10:32:23,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 30 minutes, 12 seconds)
2025-08-07 10:34:01,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:03,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 174.98950 ± 52.834
2025-08-07 10:34:03,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [136.71062, 160.38422, 174.00505, 215.32365, 140.98836, 204.77309, 302.59128, 119.908, 177.17662, 118.034065]
2025-08-07 10:34:03,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [79.0, 88.0, 96.0, 116.0, 85.0, 125.0, 142.0, 86.0, 97.0, 85.0]
2025-08-07 10:34:03,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 28 minutes, 33 seconds)
2025-08-07 10:35:40,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:42,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 159.12489 ± 10.384
2025-08-07 10:35:42,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [161.98384, 174.4029, 152.6205, 144.48341, 170.37457, 160.40758, 139.6005, 162.80232, 166.58699, 157.98637]
2025-08-07 10:35:42,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [108.0, 105.0, 96.0, 88.0, 112.0, 110.0, 110.0, 109.0, 110.0, 101.0]
2025-08-07 10:35:42,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 27 minutes, 8 seconds)
2025-08-07 10:37:19,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:19,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 119.10950 ± 3.228
2025-08-07 10:37:19,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [120.20801, 118.38292, 116.62497, 116.58124, 119.5759, 126.89526, 120.316925, 115.36259, 121.12508, 116.02212]
2025-08-07 10:37:19,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [79.0, 77.0, 76.0, 72.0, 76.0, 81.0, 77.0, 72.0, 75.0, 76.0]
2025-08-07 10:37:19,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 24 minutes, 53 seconds)
2025-08-07 10:38:58,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:59,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 153.81279 ± 8.123
2025-08-07 10:38:59,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [153.16098, 170.54443, 143.42516, 152.56577, 139.1715, 158.88089, 158.16895, 155.41682, 151.3864, 155.40714]
2025-08-07 10:38:59,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [103.0, 117.0, 93.0, 108.0, 93.0, 108.0, 105.0, 104.0, 103.0, 108.0]
2025-08-07 10:38:59,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 23 minutes, 8 seconds)
2025-08-07 10:40:36,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 158.54962 ± 7.552
2025-08-07 10:40:37,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [148.70409, 154.2954, 157.62372, 157.81705, 176.07364, 149.79802, 166.54968, 156.68918, 157.88138, 160.06407]
2025-08-07 10:40:37,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [90.0, 89.0, 91.0, 96.0, 98.0, 94.0, 94.0, 89.0, 92.0, 90.0]
2025-08-07 10:40:37,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 21 minutes, 36 seconds)
2025-08-07 10:42:15,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:16,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 139.24356 ± 11.966
2025-08-07 10:42:16,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [127.36952, 133.14792, 133.27083, 155.63925, 155.36949, 156.98366, 138.94838, 122.829285, 139.85791, 129.0194]
2025-08-07 10:42:16,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [70.0, 74.0, 73.0, 106.0, 104.0, 102.0, 82.0, 68.0, 84.0, 72.0]
2025-08-07 10:42:16,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 19 minutes, 49 seconds)
2025-08-07 10:43:54,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:55,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 206.10564 ± 38.841
2025-08-07 10:43:55,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [272.9482, 216.7336, 224.30684, 167.90736, 243.27061, 122.11168, 207.20041, 194.16449, 214.42146, 197.99167]
2025-08-07 10:43:55,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 114.0, 115.0, 113.0, 122.0, 91.0, 112.0, 107.0, 119.0, 118.0]
2025-08-07 10:43:55,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (206.11) for latency MM1Queue_a033_s075
2025-08-07 10:43:55,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 18 minutes, 6 seconds)
2025-08-07 10:45:33,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:33,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 71.80233 ± 64.394
2025-08-07 10:45:33,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [156.10324, 31.076996, 30.685234, 30.178553, 174.59581, 29.939034, 28.443224, 29.800396, 28.460638, 178.7402]
2025-08-07 10:45:33,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [92.0, 32.0, 32.0, 32.0, 95.0, 32.0, 31.0, 32.0, 31.0, 100.0]
2025-08-07 10:45:33,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 16 minutes, 39 seconds)
2025-08-07 10:47:12,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:13,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 140.21162 ± 22.126
2025-08-07 10:47:13,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [132.49924, 202.76443, 123.06278, 129.92627, 125.55833, 125.66778, 139.87927, 133.10342, 146.24101, 143.41385]
2025-08-07 10:47:13,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [81.0, 116.0, 76.0, 82.0, 79.0, 77.0, 88.0, 83.0, 88.0, 87.0]
2025-08-07 10:47:13,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 14 minutes, 59 seconds)
2025-08-07 10:48:51,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:51,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 102.62319 ± 3.406
2025-08-07 10:48:51,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [105.99491, 101.37, 109.78294, 98.6489, 101.67681, 100.107315, 102.55985, 97.82782, 103.3439, 104.91956]
2025-08-07 10:48:51,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [58.0, 57.0, 59.0, 59.0, 57.0, 58.0, 57.0, 57.0, 58.0, 58.0]
2025-08-07 10:48:51,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 13 minutes, 28 seconds)
2025-08-07 10:50:29,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:30,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 166.53485 ± 14.263
2025-08-07 10:50:30,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [169.41736, 160.44676, 173.65169, 142.94774, 159.09204, 150.5937, 156.6355, 189.15851, 182.79895, 180.60623]
2025-08-07 10:50:30,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [92.0, 95.0, 96.0, 90.0, 93.0, 90.0, 89.0, 97.0, 96.0, 91.0]
2025-08-07 10:50:30,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-08-07 10:52:08,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 220.61259 ± 19.408
2025-08-07 10:52:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [232.23076, 226.1472, 230.15, 188.16568, 240.21849, 232.39021, 222.08897, 220.98396, 234.75476, 178.99603]
2025-08-07 10:52:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 123.0, 126.0, 107.0, 131.0, 126.0, 121.0, 121.0, 128.0, 102.0]
2025-08-07 10:52:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (220.61) for latency MM1Queue_a033_s075
2025-08-07 10:52:10,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 10 minutes, 20 seconds)
2025-08-07 10:53:48,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:50,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 253.77237 ± 12.240
2025-08-07 10:53:50,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [243.78886, 244.08075, 247.0539, 266.63928, 270.03674, 266.37338, 238.32753, 255.90736, 267.34418, 238.17172]
2025-08-07 10:53:50,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 125.0, 123.0, 135.0, 139.0, 133.0, 124.0, 127.0, 138.0, 121.0]
2025-08-07 10:53:50,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (253.77) for latency MM1Queue_a033_s075
2025-08-07 10:53:50,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 9 minutes, 5 seconds)
2025-08-07 10:55:27,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:30,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 241.73592 ± 150.692
2025-08-07 10:55:30,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [137.1463, 387.02335, 90.04923, 69.10785, 345.56763, 392.6235, 421.92026, 92.41904, 404.9319, 76.57012]
2025-08-07 10:55:30,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 210.0, 183.0, 153.0, 189.0, 237.0, 431.0, 185.0, 230.0, 181.0]
2025-08-07 10:55:30,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 7 minutes, 39 seconds)
2025-08-07 10:57:08,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:13,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 459.98346 ± 204.622
2025-08-07 10:57:13,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [394.927, 519.9766, 417.894, 281.14105, 695.7825, 520.0211, 270.2456, 357.48297, 922.3744, 219.98923]
2025-08-07 10:57:13,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [309.0, 428.0, 259.0, 297.0, 408.0, 377.0, 204.0, 337.0, 692.0, 391.0]
2025-08-07 10:57:13,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (459.98) for latency MM1Queue_a033_s075
2025-08-07 10:57:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2025-08-07 10:58:53,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:54,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 167.70123 ± 5.895
2025-08-07 10:58:54,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [163.27194, 167.20152, 163.47148, 165.52184, 175.80956, 158.58908, 174.09584, 177.32106, 163.03268, 168.69724]
2025-08-07 10:58:54,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [88.0, 90.0, 87.0, 87.0, 93.0, 85.0, 90.0, 93.0, 87.0, 89.0]
2025-08-07 10:58:54,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 5 minutes, 56 seconds)
2025-08-07 11:00:32,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:33,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 229.18228 ± 27.711
2025-08-07 11:00:33,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [216.98431, 203.21645, 219.03174, 278.8739, 208.86858, 260.11917, 203.6512, 236.64232, 265.2932, 199.14172]
2025-08-07 11:00:33,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 109.0, 116.0, 145.0, 113.0, 136.0, 110.0, 125.0, 138.0, 105.0]
2025-08-07 11:00:33,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 8 seconds)
2025-08-07 11:02:12,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:13,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 249.10245 ± 12.178
2025-08-07 11:02:13,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [249.80296, 241.51564, 270.20703, 245.34755, 235.12679, 232.60396, 247.1376, 241.53964, 264.4424, 263.30087]
2025-08-07 11:02:13,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 130.0, 155.0, 132.0, 128.0, 130.0, 138.0, 131.0, 147.0, 148.0]
2025-08-07 11:02:13,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 2 minutes, 30 seconds)
2025-08-07 11:03:51,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 273.50726 ± 23.981
2025-08-07 11:03:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [238.0729, 271.18008, 294.15408, 296.13373, 277.98038, 245.21674, 287.02988, 314.1927, 266.70636, 244.40608]
2025-08-07 11:03:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 157.0, 174.0, 169.0, 162.0, 141.0, 169.0, 178.0, 154.0, 140.0]
2025-08-07 11:03:53,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 40 seconds)
2025-08-07 11:05:34,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 222.02893 ± 3.400
2025-08-07 11:05:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [218.21953, 225.13869, 219.06837, 219.96027, 218.51375, 222.00859, 227.33282, 227.0587, 219.04405, 223.94449]
2025-08-07 11:05:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 122.0, 121.0, 120.0, 117.0, 121.0, 125.0, 123.0, 119.0, 122.0]
2025-08-07 11:05:35,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 58 minutes, 58 seconds)
2025-08-07 11:07:16,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:19,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 464.69473 ± 242.344
2025-08-07 11:07:19,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [399.7702, 625.54315, 405.62506, 446.04242, 1063.3855, 198.50337, 358.85962, 328.11792, 200.3388, 620.7613]
2025-08-07 11:07:19,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [260.0, 360.0, 342.0, 319.0, 730.0, 175.0, 225.0, 210.0, 178.0, 314.0]
2025-08-07 11:07:19,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (464.69) for latency MM1Queue_a033_s075
2025-08-07 11:07:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 57 minutes, 50 seconds)
2025-08-07 11:08:59,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 295.50336 ± 29.353
2025-08-07 11:09:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [308.42508, 356.09854, 260.46765, 272.71356, 262.81882, 304.73578, 322.4811, 300.8673, 303.4409, 262.98477]
2025-08-07 11:09:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 213.0, 152.0, 158.0, 152.0, 183.0, 202.0, 186.0, 192.0, 157.0]
2025-08-07 11:09:01,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 56 minutes, 54 seconds)
2025-08-07 11:10:37,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:39,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 237.25423 ± 43.422
2025-08-07 11:10:39,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [164.91606, 171.73027, 265.4902, 272.65753, 264.85464, 266.91553, 275.43472, 281.74356, 199.9271, 208.87265]
2025-08-07 11:10:39,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 134.0, 172.0, 173.0, 176.0, 173.0, 187.0, 190.0, 141.0, 152.0]
2025-08-07 11:10:39,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 54 minutes, 38 seconds)
2025-08-07 11:12:16,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:19,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 362.47656 ± 80.226
2025-08-07 11:12:19,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [334.64133, 352.24652, 532.2253, 198.21265, 328.54285, 375.81424, 389.8458, 312.52072, 411.34885, 389.3673]
2025-08-07 11:12:19,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 222.0, 351.0, 131.0, 208.0, 248.0, 266.0, 184.0, 293.0, 265.0]
2025-08-07 11:12:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 2 seconds)
2025-08-07 11:13:58,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:00,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 327.58282 ± 7.967
2025-08-07 11:14:00,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [329.66434, 320.94635, 339.79944, 342.6888, 322.49527, 317.79895, 321.47922, 332.09042, 322.20224, 326.66342]
2025-08-07 11:14:00,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 182.0, 194.0, 192.0, 184.0, 181.0, 181.0, 192.0, 180.0, 183.0]
2025-08-07 11:14:00,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 51 minutes, 3 seconds)
2025-08-07 11:15:39,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 662.62573 ± 386.041
2025-08-07 11:15:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1458.9312, 1175.5664, 218.2917, 426.8186, 407.69467, 988.3341, 627.36163, 357.4945, 412.22086, 553.54364]
2025-08-07 11:15:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [673.0, 548.0, 184.0, 192.0, 245.0, 528.0, 295.0, 226.0, 210.0, 277.0]
2025-08-07 11:15:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (662.63) for latency MM1Queue_a033_s075
2025-08-07 11:15:43,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 49 minutes, 6 seconds)
2025-08-07 11:17:21,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:24,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 360.87762 ± 37.903
2025-08-07 11:17:24,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [369.92636, 347.72614, 358.16965, 355.41183, 461.2847, 386.74686, 327.77304, 328.47028, 335.7136, 337.5538]
2025-08-07 11:17:24,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [221.0, 206.0, 209.0, 207.0, 258.0, 226.0, 179.0, 186.0, 198.0, 196.0]
2025-08-07 11:17:24,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 47 minutes, 12 seconds)
2025-08-07 11:19:03,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:05,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 302.24231 ± 32.028
2025-08-07 11:19:05,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [268.77298, 246.04707, 324.72516, 305.83887, 338.77957, 301.21277, 321.10043, 316.804, 257.47885, 341.66367]
2025-08-07 11:19:05,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 141.0, 171.0, 159.0, 178.0, 160.0, 169.0, 164.0, 141.0, 179.0]
2025-08-07 11:19:05,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 46 minutes, 18 seconds)
2025-08-07 11:20:43,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:45,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 283.38086 ± 23.273
2025-08-07 11:20:45,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [275.60178, 311.34625, 246.43697, 308.9721, 293.8241, 258.10315, 290.65103, 263.73254, 317.04633, 268.0943]
2025-08-07 11:20:45,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 162.0, 137.0, 158.0, 153.0, 140.0, 155.0, 141.0, 162.0, 144.0]
2025-08-07 11:20:45,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 36 seconds)
2025-08-07 11:22:25,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 444.38980 ± 57.248
2025-08-07 11:22:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [473.26328, 504.75208, 406.77313, 437.47482, 485.66483, 340.93823, 354.778, 460.57138, 521.49304, 458.1891]
2025-08-07 11:22:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [260.0, 289.0, 197.0, 248.0, 240.0, 180.0, 182.0, 252.0, 303.0, 257.0]
2025-08-07 11:22:28,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 14 seconds)
2025-08-07 11:24:07,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:13,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 865.58527 ± 645.690
2025-08-07 11:24:13,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [706.64966, 1461.3468, 289.8522, 434.38144, 336.60477, 407.88876, 444.3968, 2225.133, 643.174, 1706.4248]
2025-08-07 11:24:13,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [343.0, 693.0, 190.0, 252.0, 197.0, 251.0, 281.0, 1000.0, 347.0, 878.0]
2025-08-07 11:24:13,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (865.59) for latency MM1Queue_a033_s075
2025-08-07 11:24:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 41 minutes, 58 seconds)
2025-08-07 11:25:51,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:54,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 552.60040 ± 314.789
2025-08-07 11:25:54,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [491.70157, 393.58282, 397.1848, 584.6063, 1423.2069, 464.51434, 692.9677, 198.49501, 456.1037, 423.6409]
2025-08-07 11:25:54,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [255.0, 204.0, 204.0, 255.0, 564.0, 245.0, 298.0, 118.0, 236.0, 246.0]
2025-08-07 11:25:54,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 18 seconds)
2025-08-07 11:27:32,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 495.97046 ± 174.920
2025-08-07 11:27:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [502.71497, 430.9923, 242.71255, 560.89154, 586.45154, 463.3298, 829.04346, 238.23799, 694.624, 410.70587]
2025-08-07 11:27:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 231.0, 129.0, 261.0, 286.0, 247.0, 364.0, 129.0, 290.0, 220.0]
2025-08-07 11:27:35,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 33 seconds)
2025-08-07 11:29:16,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:19,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 691.75275 ± 535.761
2025-08-07 11:29:19,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [470.7065, 481.21976, 2295.7546, 534.26843, 488.86588, 573.9726, 493.07092, 564.3668, 484.6115, 530.69025]
2025-08-07 11:29:19,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 213.0, 1000.0, 225.0, 204.0, 232.0, 208.0, 229.0, 215.0, 223.0]
2025-08-07 11:29:19,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes, 38 seconds)
2025-08-07 11:30:58,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1015.22382 ± 885.681
2025-08-07 11:31:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [192.03287, 1733.0548, 162.18497, 225.20593, 2291.2935, 1462.5062, 1376.6805, 151.49625, 2386.9167, 170.86586]
2025-08-07 11:31:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 731.0, 141.0, 177.0, 1000.0, 651.0, 629.0, 134.0, 1000.0, 136.0]
2025-08-07 11:31:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1015.22) for latency MM1Queue_a033_s075
2025-08-07 11:31:04,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 36 minutes, 21 seconds)
2025-08-07 11:32:44,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:48,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 870.25360 ± 96.671
2025-08-07 11:32:48,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [944.2731, 1000.2628, 736.7991, 911.4123, 739.7642, 746.93695, 908.0841, 944.01074, 962.94464, 808.0484]
2025-08-07 11:32:48,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [382.0, 386.0, 311.0, 368.0, 314.0, 312.0, 357.0, 381.0, 375.0, 334.0]
2025-08-07 11:32:48,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 34 minutes, 30 seconds)
2025-08-07 11:34:26,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1081.17151 ± 577.594
2025-08-07 11:34:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [548.27997, 1494.6793, 493.15768, 1302.3889, 1192.0703, 1470.2969, 488.66397, 470.29056, 1008.4779, 2343.4092]
2025-08-07 11:34:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [244.0, 573.0, 218.0, 523.0, 457.0, 592.0, 218.0, 210.0, 408.0, 959.0]
2025-08-07 11:34:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1081.17) for latency MM1Queue_a033_s075
2025-08-07 11:34:31,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 5 seconds)
2025-08-07 11:36:16,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:17,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 288.18808 ± 40.767
2025-08-07 11:36:17,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [278.60208, 250.62434, 273.68777, 270.8291, 393.20865, 272.98242, 273.31842, 334.74585, 261.78677, 272.09534]
2025-08-07 11:36:17,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 116.0, 121.0, 121.0, 154.0, 127.0, 128.0, 140.0, 117.0, 127.0]
2025-08-07 11:36:17,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 32 minutes, 12 seconds)
2025-08-07 11:37:56,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2224.66382 ± 503.266
2025-08-07 11:38:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2771.886, 2543.2793, 2679.0864, 2160.599, 1547.3975, 1613.9758, 1666.5056, 2835.9592, 1764.4672, 2663.484]
2025-08-07 11:38:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 881.0, 1000.0, 781.0, 554.0, 552.0, 600.0, 1000.0, 640.0, 1000.0]
2025-08-07 11:38:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (2224.66) for latency MM1Queue_a033_s075
2025-08-07 11:38:06,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 31 minutes, 18 seconds)
2025-08-07 11:39:50,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:02,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2272.81470 ± 74.795
2025-08-07 11:40:02,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2148.4482, 2299.6418, 2216.6091, 2229.131, 2178.0159, 2309.6582, 2385.8052, 2265.095, 2359.093, 2336.6477]
2025-08-07 11:40:02,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:40:02,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (2272.81) for latency MM1Queue_a033_s075
2025-08-07 11:40:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 31 minutes, 29 seconds)
2025-08-07 11:41:34,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:36,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 348.68311 ± 85.726
2025-08-07 11:41:36,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [365.15036, 320.04956, 236.84338, 231.80754, 349.76044, 453.5274, 299.5494, 290.9091, 458.76392, 480.46997]
2025-08-07 11:41:36,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 126.0, 100.0, 108.0, 127.0, 180.0, 125.0, 116.0, 154.0, 188.0]
2025-08-07 11:41:36,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2025-08-07 11:43:17,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:23,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1337.78296 ± 625.236
2025-08-07 11:43:23,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1208.2777, 838.03674, 2382.462, 2062.2644, 1289.0945, 1308.2725, 402.47223, 457.70242, 1502.6683, 1926.58]
2025-08-07 11:43:23,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [456.0, 349.0, 895.0, 731.0, 528.0, 536.0, 174.0, 195.0, 606.0, 737.0]
2025-08-07 11:43:23,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 26 minutes, 52 seconds)
2025-08-07 11:45:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:05,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 976.97675 ± 583.892
2025-08-07 11:45:05,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [365.53506, 1175.1619, 1463.6519, 417.4797, 406.6367, 1014.7578, 2272.9446, 1110.6979, 1182.3418, 360.56027]
2025-08-07 11:45:05,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 395.0, 464.0, 196.0, 188.0, 392.0, 797.0, 376.0, 384.0, 171.0]
2025-08-07 11:45:05,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 24 minutes, 31 seconds)
2025-08-07 11:46:50,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:02,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2783.11621 ± 186.727
2025-08-07 11:47:02,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2732.931, 3078.686, 2985.6045, 2902.9856, 2650.9363, 2839.6833, 2698.801, 2851.699, 2714.4116, 2375.423]
2025-08-07 11:47:02,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 914.0]
2025-08-07 11:47:02,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (2783.12) for latency MM1Queue_a033_s075
2025-08-07 11:47:02,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 24 minutes, 2 seconds)
2025-08-07 11:48:39,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:49,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2527.13916 ± 561.176
2025-08-07 11:48:49,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3190.8281, 1907.3801, 3071.3225, 2996.439, 1903.3619, 3147.6968, 2545.7886, 2112.6296, 1630.0106, 2765.9363]
2025-08-07 11:48:49,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 580.0, 1000.0, 959.0, 605.0, 1000.0, 743.0, 661.0, 497.0, 857.0]
2025-08-07 11:48:49,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 20 minutes, 43 seconds)
2025-08-07 11:50:35,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:39,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1031.91125 ± 826.092
2025-08-07 11:50:39,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [816.3163, 1716.5326, 547.9875, 1851.335, 491.6153, 223.80006, 284.1592, 1082.1775, 2902.8625, 402.3257]
2025-08-07 11:50:39,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [295.0, 510.0, 257.0, 569.0, 219.0, 142.0, 154.0, 373.0, 1000.0, 188.0]
2025-08-07 11:50:39,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 21 minutes, 29 seconds)
2025-08-07 11:52:16,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:27,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2554.80322 ± 652.671
2025-08-07 11:52:27,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3001.3303, 1426.7401, 2898.798, 1148.9641, 2850.1658, 2942.309, 2901.9065, 2861.4895, 3039.6897, 2476.6392]
2025-08-07 11:52:27,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 528.0, 1000.0, 421.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 865.0]
2025-08-07 11:52:27,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 19 minutes, 44 seconds)
2025-08-07 11:54:08,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:15,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1945.82288 ± 960.406
2025-08-07 11:54:15,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3039.2974, 1023.7663, 2137.2322, 437.5878, 867.71484, 3040.518, 2892.5864, 2932.9583, 1148.9701, 1937.5983]
2025-08-07 11:54:15,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 349.0, 602.0, 208.0, 309.0, 1000.0, 1000.0, 1000.0, 370.0, 574.0]
2025-08-07 11:54:15,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 18 minutes, 50 seconds)
2025-08-07 11:55:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:06,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2459.05811 ± 774.624
2025-08-07 11:56:06,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3095.1082, 932.12054, 3000.3596, 2014.3312, 3096.3455, 3101.7498, 3140.4895, 1672.9088, 1636.3628, 2900.8022]
2025-08-07 11:56:06,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 321.0, 1000.0, 683.0, 1000.0, 1000.0, 1000.0, 568.0, 555.0, 1000.0]
2025-08-07 11:56:06,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 16 minutes, 7 seconds)
2025-08-07 11:57:43,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:49,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1524.87036 ± 891.238
2025-08-07 11:57:49,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1000.2755, 1925.991, 1059.2156, 911.959, 913.28723, 821.6986, 3117.27, 2232.0962, 2855.4392, 411.47046]
2025-08-07 11:57:49,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [375.0, 601.0, 374.0, 325.0, 320.0, 303.0, 1000.0, 725.0, 862.0, 194.0]
2025-08-07 11:57:49,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 13 minutes, 48 seconds)
2025-08-07 11:59:33,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:45,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2636.51025 ± 538.374
2025-08-07 11:59:45,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2794.823, 2682.8745, 2967.321, 2818.9822, 2807.0317, 2721.0815, 2824.138, 2872.4507, 1036.4939, 2839.9065]
2025-08-07 11:59:45,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 367.0, 1000.0]
2025-08-07 11:59:45,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 12 minutes, 48 seconds)
2025-08-07 12:01:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2019.52710 ± 921.233
2025-08-07 12:01:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3076.6614, 1958.9878, 3023.1646, 1731.4412, 2990.9534, 239.74217, 1450.2039, 1108.2439, 1658.7052, 2957.168]
2025-08-07 12:01:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 698.0, 1000.0, 622.0, 1000.0, 149.0, 521.0, 407.0, 569.0, 1000.0]
2025-08-07 12:01:29,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 10 minutes, 30 seconds)
2025-08-07 12:03:11,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:22,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2844.09424 ± 480.758
2025-08-07 12:03:22,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3157.8198, 3113.3972, 3078.77, 3112.566, 2201.3574, 1657.6483, 2835.2207, 3137.623, 3049.41, 3097.13]
2025-08-07 12:03:22,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 713.0, 510.0, 913.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:03:22,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (2844.09) for latency MM1Queue_a033_s075
2025-08-07 12:03:22,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 9 minutes, 16 seconds)
2025-08-07 12:05:03,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:13,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2466.27222 ± 1095.785
2025-08-07 12:05:13,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2995.6348, 3015.044, 3061.7192, 2928.9016, 3000.5476, 3064.8218, 268.07327, 2973.7676, 3069.6885, 284.52332]
2025-08-07 12:05:13,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 163.0, 1000.0, 1000.0, 165.0]
2025-08-07 12:05:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 7 minutes, 27 seconds)
2025-08-07 12:06:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:02,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2818.18457 ± 791.631
2025-08-07 12:07:02,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [492.07425, 3198.4634, 3040.0242, 3050.4526, 3088.0308, 3182.2866, 2627.9785, 3156.8252, 3187.1108, 3158.6003]
2025-08-07 12:07:02,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 817.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:07:02,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2025-08-07 12:08:46,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:58,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2959.04712 ± 49.607
2025-08-07 12:08:58,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2871.2007, 2864.8523, 2999.345, 2958.659, 3003.786, 2960.484, 2968.674, 3004.1375, 2953.0693, 3006.263]
2025-08-07 12:08:58,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:08:58,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (2959.05) for latency MM1Queue_a033_s075
2025-08-07 12:08:58,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 4 minutes, 33 seconds)
2025-08-07 12:10:42,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:54,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3287.47388 ± 35.959
2025-08-07 12:10:54,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3243.3, 3358.649, 3268.5735, 3274.9968, 3328.812, 3320.2039, 3259.9438, 3302.089, 3254.7952, 3263.3748]
2025-08-07 12:10:54,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:10:54,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3287.47) for latency MM1Queue_a033_s075
2025-08-07 12:10:54,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 4 minutes, 1 second)
2025-08-07 12:12:29,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:38,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2441.20142 ± 1195.018
2025-08-07 12:12:38,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3447.8098, 1092.0796, 3393.0786, 833.8607, 3422.5867, 3378.7131, 3386.8264, 872.48676, 1126.9119, 3457.661]
2025-08-07 12:12:38,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 346.0, 1000.0, 277.0, 1000.0, 1000.0, 1000.0, 293.0, 355.0, 1000.0]
2025-08-07 12:12:38,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 1 minute, 6 seconds)
2025-08-07 12:14:18,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:31,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3092.12939 ± 39.251
2025-08-07 12:14:31,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3026.8408, 3121.1472, 3084.3235, 3090.4556, 3128.4336, 3041.5312, 3063.759, 3091.7222, 3108.162, 3164.916]
2025-08-07 12:14:31,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:14:31,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 59 minutes, 29 seconds)
2025-08-07 12:16:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3125.47412 ± 39.331
2025-08-07 12:16:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3108.0674, 3175.9683, 3120.0364, 3100.5742, 3145.5266, 3081.1885, 3147.2417, 3164.0996, 3047.3638, 3164.6758]
2025-08-07 12:16:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:16:23,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 57 minutes, 57 seconds)
2025-08-07 12:18:09,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:21,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3322.87695 ± 55.431
2025-08-07 12:18:21,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3310.055, 3294.6838, 3304.4285, 3312.6013, 3298.722, 3317.6094, 3477.9648, 3289.601, 3352.1355, 3270.9717]
2025-08-07 12:18:21,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:18:21,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3322.88) for latency MM1Queue_a033_s075
2025-08-07 12:18:21,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 16 seconds)
2025-08-07 12:19:55,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:08,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3228.17627 ± 55.285
2025-08-07 12:20:08,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3194.86, 3237.9653, 3222.4038, 3362.8435, 3190.449, 3195.5698, 3163.814, 3216.0437, 3206.75, 3291.0652]
2025-08-07 12:20:08,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:20:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 31 seconds)
2025-08-07 12:21:49,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:01,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3176.96094 ± 617.275
2025-08-07 12:22:01,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3376.7043, 3380.0625, 3401.755, 3369.624, 1328.33, 3447.8086, 3436.3093, 3352.6245, 3355.1963, 3321.1948]
2025-08-07 12:22:01,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 423.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:22:01,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 33 seconds)
2025-08-07 12:23:41,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:54,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3440.09766 ± 41.109
2025-08-07 12:23:54,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3440.8313, 3423.1155, 3396.7417, 3476.2344, 3452.773, 3365.9185, 3480.5825, 3417.3335, 3514.606, 3432.8381]
2025-08-07 12:23:54,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:23:54,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3440.10) for latency MM1Queue_a033_s075
2025-08-07 12:23:54,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 50 minutes, 40 seconds)
2025-08-07 12:25:32,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:44,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3270.02026 ± 42.768
2025-08-07 12:25:44,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3306.69, 3286.8945, 3310.5017, 3257.0789, 3300.8533, 3201.6443, 3245.6138, 3271.2036, 3324.7668, 3194.9543]
2025-08-07 12:25:44,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:25:44,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 48 minutes, 40 seconds)
2025-08-07 12:27:24,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:27:34,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2573.36426 ± 1218.137
2025-08-07 12:27:34,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3376.6409, 3372.833, 1063.1506, 3324.642, 3411.667, 548.3844, 3339.752, 3380.2441, 559.8861, 3356.443]
2025-08-07 12:27:34,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 333.0, 1000.0, 1000.0, 222.0, 1000.0, 1000.0, 230.0, 1000.0]
2025-08-07 12:27:34,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 46 minutes, 4 seconds)
2025-08-07 12:29:20,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:29:33,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3239.42627 ± 35.010
2025-08-07 12:29:33,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3151.8042, 3247.6719, 3279.6829, 3257.2896, 3267.7827, 3231.555, 3260.5444, 3258.8667, 3212.824, 3226.2402]
2025-08-07 12:29:33,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:29:33,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 10 seconds)
2025-08-07 12:31:13,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:31:25,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3262.01025 ± 32.920
2025-08-07 12:31:25,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3287.0444, 3273.629, 3238.639, 3312.5566, 3226.3818, 3278.0996, 3259.5212, 3216.1855, 3223.204, 3304.8403]
2025-08-07 12:31:25,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:31:25,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 16 seconds)
2025-08-07 12:33:05,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:16,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3066.91455 ± 829.873
2025-08-07 12:33:16,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3369.833, 3355.1797, 3296.801, 582.5248, 3341.4722, 3373.7117, 3463.4148, 3334.0198, 3249.5125, 3302.6748]
2025-08-07 12:33:16,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 257.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:33:16,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 15 seconds)
2025-08-07 12:34:51,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:35:03,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3233.31909 ± 20.164
2025-08-07 12:35:03,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3239.2463, 3189.3755, 3239.6316, 3236.6548, 3233.9, 3231.299, 3208.5024, 3251.8433, 3236.6401, 3266.0977]
2025-08-07 12:35:03,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:35:03,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 8 seconds)
2025-08-07 12:36:44,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:56,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3367.09131 ± 26.926
2025-08-07 12:36:56,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3384.6865, 3387.5178, 3356.785, 3407.7466, 3352.3774, 3321.9043, 3378.1658, 3390.395, 3367.1453, 3324.192]
2025-08-07 12:36:56,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:36:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 27 seconds)
2025-08-07 12:38:43,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3446.02979 ± 28.951
2025-08-07 12:38:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3459.0405, 3446.0251, 3452.8062, 3440.8606, 3466.4907, 3419.7744, 3431.5266, 3420.926, 3514.8887, 3407.9585]
2025-08-07 12:38:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:38:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3446.03) for latency MM1Queue_a033_s075
2025-08-07 12:38:56,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 39 seconds)
2025-08-07 12:40:36,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:48,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3548.91748 ± 48.920
2025-08-07 12:40:48,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3439.8606, 3503.6565, 3591.5198, 3590.0493, 3590.4133, 3526.1938, 3536.9463, 3595.4258, 3527.6414, 3587.4673]
2025-08-07 12:40:48,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:40:48,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3548.92) for latency MM1Queue_a033_s075
2025-08-07 12:40:48,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 47 seconds)
2025-08-07 12:42:28,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:40,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3448.77148 ± 54.598
2025-08-07 12:42:40,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3490.7024, 3488.636, 3514.6804, 3326.1814, 3429.7832, 3389.5256, 3489.346, 3482.9436, 3447.2085, 3428.7097]
2025-08-07 12:42:40,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:42:40,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 56 seconds)
2025-08-07 12:44:16,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:27,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3033.30420 ± 865.891
2025-08-07 12:44:27,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3310.627, 3304.067, 3369.1055, 3243.7888, 3338.3162, 3330.0447, 3313.2822, 437.41272, 3333.4685, 3352.9297]
2025-08-07 12:44:27,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 187.0, 1000.0, 1000.0]
2025-08-07 12:44:27,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 4 seconds)
2025-08-07 12:46:02,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:10,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2082.09937 ± 1474.338
2025-08-07 12:46:10,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3283.1829, 224.55806, 3276.3674, 3217.0806, 260.08633, 3364.3284, 298.91504, 324.95215, 3270.8757, 3300.6467]
2025-08-07 12:46:10,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 122.0, 1000.0, 1000.0, 135.0, 1000.0, 148.0, 151.0, 1000.0, 1000.0]
2025-08-07 12:46:10,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 43 seconds)
2025-08-07 12:47:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:48:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3656.56714 ± 86.474
2025-08-07 12:48:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3691.284, 3414.8096, 3699.4155, 3602.279, 3661.7344, 3713.2532, 3671.2146, 3695.5647, 3713.7715, 3702.342]
2025-08-07 12:48:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 929.0, 1000.0, 979.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:48:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3656.57) for latency MM1Queue_a033_s075
2025-08-07 12:48:06,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 41 seconds)
2025-08-07 12:49:46,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:57,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3114.87354 ± 929.432
2025-08-07 12:49:57,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3483.8909, 331.92206, 3379.0684, 3428.2156, 3408.014, 3421.6135, 3353.3635, 3346.681, 3446.5625, 3549.402]
2025-08-07 12:49:57,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 140.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:49:57,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 46 seconds)
2025-08-07 12:51:37,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:49,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3619.57935 ± 60.408
2025-08-07 12:51:49,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3690.9797, 3617.7131, 3699.3752, 3666.2556, 3599.269, 3564.9558, 3522.4028, 3537.1348, 3676.177, 3621.5276]
2025-08-07 12:51:49,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:51:49,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 56 seconds)
2025-08-07 12:53:30,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:53:42,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3506.01318 ± 28.400
2025-08-07 12:53:42,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3559.3787, 3502.5044, 3537.2986, 3478.9373, 3472.906, 3529.4417, 3506.7427, 3484.4253, 3470.7427, 3517.756]
2025-08-07 12:53:42,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:53:42,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 21 seconds)
2025-08-07 12:55:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:55:34,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3430.01245 ± 58.988
2025-08-07 12:55:34,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3380.6052, 3485.943, 3386.2456, 3403.9243, 3355.098, 3353.4116, 3472.607, 3508.7322, 3439.6, 3513.9585]
2025-08-07 12:55:34,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:55:34,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 47 seconds)
2025-08-07 12:57:15,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:57:26,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3347.50928 ± 1013.849
2025-08-07 12:57:26,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3714.3113, 3732.494, 3639.7012, 3775.5835, 3657.6099, 3784.5588, 312.8997, 3613.204, 3692.3552, 3552.3755]
2025-08-07 12:57:26,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 164.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:57:26,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 47 seconds)
2025-08-07 12:58:58,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:59:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3319.71338 ± 1002.924
2025-08-07 12:59:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3639.9329, 311.5432, 3677.753, 3634.1284, 3685.4702, 3646.8228, 3624.3845, 3683.0984, 3647.228, 3646.773]
2025-08-07 12:59:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 146.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:59:10,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 44 seconds)
2025-08-07 13:00:47,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:00:59,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3415.75537 ± 972.396
2025-08-07 13:00:59,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3737.7568, 3743.2668, 3718.2021, 3727.8486, 499.82803, 3680.5112, 3791.5188, 3771.2651, 3730.9062, 3756.45]
2025-08-07 13:00:59,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 200.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:00:59,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 49 seconds)
2025-08-07 13:02:39,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:02:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3513.81958 ± 28.570
2025-08-07 13:02:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3534.2634, 3525.423, 3464.1057, 3465.4019, 3518.3403, 3535.3582, 3551.3533, 3520.213, 3491.2239, 3532.5112]
2025-08-07 13:02:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:02:52,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 58 seconds)
2025-08-07 13:04:31,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:04:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3334.16455 ± 888.847
2025-08-07 13:04:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3614.4006, 3622.0247, 3599.605, 3671.1392, 3639.738, 669.00037, 3576.2188, 3672.8284, 3649.9375, 3626.7534]
2025-08-07 13:04:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 276.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:04:43,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 8 seconds)
2025-08-07 13:06:22,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:06:29,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1977.53247 ± 1665.393
2025-08-07 13:06:29,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [492.47208, 3606.5957, 271.2755, 3682.3462, 269.34262, 3702.542, 263.8965, 3578.4468, 3637.0952, 271.31177]
2025-08-07 13:06:29,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 1000.0, 134.0, 1000.0, 133.0, 1000.0, 133.0, 1000.0, 1000.0, 134.0]
2025-08-07 13:06:29,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 14 seconds)
2025-08-07 13:08:14,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:08:25,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3416.01831 ± 1044.788
2025-08-07 13:08:25,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3718.94, 283.1669, 3732.8413, 3791.2942, 3771.9534, 3781.202, 3764.55, 3732.0068, 3835.8787, 3748.3508]
2025-08-07 13:08:25,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 136.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:08:25,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 33 seconds)
2025-08-07 13:10:04,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:10:12,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2297.72095 ± 1638.561
2025-08-07 13:10:12,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3676.3447, 127.00642, 215.32045, 317.43576, 3661.2644, 3713.0579, 3587.6548, 3610.441, 519.4216, 3549.2622]
2025-08-07 13:10:12,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 72.0, 107.0, 133.0, 1000.0, 1000.0, 1000.0, 1000.0, 179.0, 1000.0]
2025-08-07 13:10:12,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 41 seconds)
2025-08-07 13:11:52,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:12:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3725.09912 ± 13.955
2025-08-07 13:12:04,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3733.4856, 3734.6528, 3732.5144, 3689.4705, 3736.5007, 3720.5466, 3735.621, 3732.2886, 3722.761, 3713.1519]
2025-08-07 13:12:04,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:12:04,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3725.10) for latency MM1Queue_a033_s075
2025-08-07 13:12:04,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 50 seconds)
2025-08-07 13:13:37,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:50,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3815.63232 ± 39.184
2025-08-07 13:13:50,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3792.1208, 3877.081, 3824.0667, 3772.6448, 3801.9453, 3769.6558, 3812.858, 3869.9758, 3861.7622, 3774.2124]
2025-08-07 13:13:50,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:13:50,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (3815.63) for latency MM1Queue_a033_s075
2025-08-07 13:13:50,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1251 [DEBUG]: Training session finished
