2025-08-07 09:28:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:28:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:28:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14e40379fdd0>}
2025-08-07 09:28:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 09:28:23,707 baseline-bpql-noiseperc0-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 09:28:23,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1133 [INFO]: Creating new trainer
2025-08-07 09:28:23,724 baseline-bpql-noiseperc0-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 09:28:23,724 baseline-bpql-noiseperc0-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 09:28:25,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 09:28:25,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 09:29:54,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:54,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 49.97161 ± 0.917
2025-08-07 09:29:54,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [49.547997, 51.447968, 49.153942, 49.436153, 51.24802, 49.319088, 49.474228, 51.391735, 49.343575, 49.353374]
2025-08-07 09:29:54,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [34.0, 35.0, 34.0, 34.0, 35.0, 34.0, 34.0, 35.0, 34.0, 34.0]
2025-08-07 09:29:54,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (49.97) for latency MM1Queue_a033_s075
2025-08-07 09:29:54,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 28 minutes, 8 seconds)
2025-08-07 09:31:31,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 325.46649 ± 94.203
2025-08-07 09:31:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [342.58005, 342.0169, 357.60486, 316.88968, 371.3262, 333.63724, 50.655346, 364.2661, 381.1067, 394.58194]
2025-08-07 09:31:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [190.0, 194.0, 204.0, 179.0, 203.0, 183.0, 42.0, 191.0, 205.0, 208.0]
2025-08-07 09:31:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (325.47) for latency MM1Queue_a033_s075
2025-08-07 09:31:33,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 33 minutes, 50 seconds)
2025-08-07 09:33:09,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:11,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 129.82408 ± 12.661
2025-08-07 09:33:11,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [134.80629, 140.80542, 111.33574, 131.6096, 144.17775, 115.20467, 136.02979, 107.3986, 139.66708, 137.20589]
2025-08-07 09:33:11,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 120.0, 91.0, 113.0, 121.0, 96.0, 116.0, 87.0, 119.0, 117.0]
2025-08-07 09:33:11,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 34 minutes, 8 seconds)
2025-08-07 09:34:48,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 161.24413 ± 2.286
2025-08-07 09:34:49,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [161.8404, 166.08678, 162.82285, 161.25659, 160.09833, 159.3271, 158.50827, 162.64928, 157.91861, 161.93312]
2025-08-07 09:34:49,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 92.0, 90.0, 90.0, 89.0, 89.0, 88.0, 90.0, 88.0, 90.0]
2025-08-07 09:34:49,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 33 minutes, 47 seconds)
2025-08-07 09:36:26,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:28,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 325.40582 ± 9.648
2025-08-07 09:36:28,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [331.40213, 328.49594, 336.12607, 329.24136, 311.38004, 323.39612, 314.36755, 312.84595, 341.9902, 324.81308]
2025-08-07 09:36:28,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 158.0, 160.0, 159.0, 154.0, 158.0, 155.0, 152.0, 162.0, 160.0]
2025-08-07 09:36:28,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 33 minutes, 11 seconds)
2025-08-07 09:38:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:08,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 162.23642 ± 66.579
2025-08-07 09:38:08,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [235.26802, 119.38175, 175.21759, 120.55386, 328.5845, 118.005875, 168.0535, 117.10265, 120.72596, 119.470406]
2025-08-07 09:38:08,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 78.0, 110.0, 79.0, 172.0, 78.0, 110.0, 77.0, 79.0, 79.0]
2025-08-07 09:38:08,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 34 minutes, 37 seconds)
2025-08-07 09:39:45,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:39:47,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 367.66940 ± 14.080
2025-08-07 09:39:47,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [354.65863, 350.70007, 372.76816, 397.40628, 367.00607, 377.1635, 353.9423, 364.012, 356.95584, 382.08124]
2025-08-07 09:39:47,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 172.0, 183.0, 190.0, 178.0, 182.0, 174.0, 176.0, 176.0, 186.0]
2025-08-07 09:39:47,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (367.67) for latency MM1Queue_a033_s075
2025-08-07 09:39:47,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 33 minutes, 13 seconds)
2025-08-07 09:41:24,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:41:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 438.57373 ± 115.873
2025-08-07 09:41:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [243.79083, 508.8706, 508.1957, 585.94226, 470.72723, 437.43857, 389.96234, 491.68237, 533.31793, 215.80959]
2025-08-07 09:41:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 232.0, 237.0, 269.0, 228.0, 217.0, 196.0, 224.0, 249.0, 129.0]
2025-08-07 09:41:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (438.57) for latency MM1Queue_a033_s075
2025-08-07 09:41:26,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 31 minutes, 55 seconds)
2025-08-07 09:43:04,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:06,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 309.46744 ± 246.866
2025-08-07 09:43:06,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [196.94356, 183.56705, 183.7663, 195.2648, 646.9755, 183.91614, 194.34828, 188.26782, 194.67882, 926.94617]
2025-08-07 09:43:06,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 106.0, 105.0, 109.0, 271.0, 105.0, 109.0, 106.0, 109.0, 357.0]
2025-08-07 09:43:06,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 30 minutes, 41 seconds)
2025-08-07 09:44:45,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:44:48,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 729.93518 ± 225.578
2025-08-07 09:44:48,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [589.36865, 775.4586, 980.4781, 773.24756, 1088.2076, 717.7134, 654.9837, 713.98987, 807.04944, 198.855]
2025-08-07 09:44:48,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [225.0, 270.0, 338.0, 268.0, 358.0, 250.0, 253.0, 250.0, 278.0, 104.0]
2025-08-07 09:44:48,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (729.94) for latency MM1Queue_a033_s075
2025-08-07 09:44:48,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 29 minutes, 47 seconds)
2025-08-07 09:46:24,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:28,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 785.35089 ± 275.074
2025-08-07 09:46:28,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [695.0716, 1555.051, 712.1249, 814.5824, 534.18085, 804.9278, 535.80383, 643.76654, 734.39343, 823.6067]
2025-08-07 09:46:28,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [299.0, 629.0, 254.0, 301.0, 214.0, 285.0, 216.0, 247.0, 264.0, 299.0]
2025-08-07 09:46:28,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (785.35) for latency MM1Queue_a033_s075
2025-08-07 09:46:28,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 28 minutes, 20 seconds)
2025-08-07 09:48:11,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:13,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 463.47031 ± 484.202
2025-08-07 09:48:13,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [277.56302, 281.5896, 283.95804, 1910.1227, 279.63626, 308.96225, 429.61325, 299.6771, 272.1612, 291.41956]
2025-08-07 09:48:13,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 150.0, 147.0, 851.0, 147.0, 160.0, 198.0, 157.0, 144.0, 153.0]
2025-08-07 09:48:13,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 28 minutes, 27 seconds)
2025-08-07 09:49:48,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:49:50,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 235.31050 ± 40.487
2025-08-07 09:49:50,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [259.42325, 244.52785, 222.52205, 252.09268, 249.15605, 119.84881, 264.01636, 259.2086, 229.03093, 253.27837]
2025-08-07 09:49:50,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 139.0, 132.0, 140.0, 139.0, 75.0, 142.0, 142.0, 134.0, 140.0]
2025-08-07 09:49:50,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 26 minutes, 8 seconds)
2025-08-07 09:51:24,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:51:30,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1422.20837 ± 628.707
2025-08-07 09:51:30,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2290.4417, 760.1957, 1693.0679, 2144.3142, 1446.2517, 2303.4556, 1128.977, 460.86795, 1090.3417, 904.16986]
2025-08-07 09:51:30,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [772.0, 316.0, 572.0, 752.0, 481.0, 788.0, 369.0, 190.0, 375.0, 338.0]
2025-08-07 09:51:30,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1422.21) for latency MM1Queue_a033_s075
2025-08-07 09:51:30,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 24 minutes, 35 seconds)
2025-08-07 09:53:08,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:53:11,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 799.54120 ± 57.406
2025-08-07 09:53:11,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [837.272, 745.21216, 727.21655, 831.3537, 859.66364, 708.44305, 755.7499, 882.7338, 809.75415, 838.0136]
2025-08-07 09:53:11,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 240.0, 235.0, 260.0, 264.0, 235.0, 240.0, 270.0, 253.0, 261.0]
2025-08-07 09:53:11,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 22 minutes, 43 seconds)
2025-08-07 09:54:49,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:55,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1402.98584 ± 793.282
2025-08-07 09:54:55,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [872.1177, 890.47003, 889.56024, 657.41284, 2861.211, 1192.1158, 2663.8672, 2161.1702, 1174.6947, 667.2399]
2025-08-07 09:54:55,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [307.0, 316.0, 312.0, 244.0, 1000.0, 414.0, 921.0, 759.0, 437.0, 260.0]
2025-08-07 09:54:55,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 22 minutes, 3 seconds)
2025-08-07 09:56:33,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:56:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 944.57068 ± 144.337
2025-08-07 09:56:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [998.21906, 991.87976, 767.68274, 851.61285, 910.5356, 820.3941, 1325.6403, 910.0287, 942.3043, 927.40955]
2025-08-07 09:56:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [325.0, 322.0, 253.0, 269.0, 303.0, 303.0, 435.0, 298.0, 291.0, 289.0]
2025-08-07 09:56:37,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 19 minutes, 17 seconds)
2025-08-07 09:58:15,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:19,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 990.08270 ± 32.288
2025-08-07 09:58:19,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [990.61847, 1021.7026, 1025.2505, 960.85626, 932.6751, 993.41034, 1021.20483, 1013.0928, 1001.0197, 940.9967]
2025-08-07 09:58:19,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [318.0, 316.0, 317.0, 302.0, 298.0, 316.0, 329.0, 325.0, 318.0, 301.0]
2025-08-07 09:58:19,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 19 minutes, 7 seconds)
2025-08-07 09:59:55,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:59:58,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 528.41602 ± 523.420
2025-08-07 09:59:58,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [489.09418, 1284.3951, 1077.4912, 1399.208, 66.415665, 66.2869, 60.93479, 63.742157, 709.3387, 67.25355]
2025-08-07 09:59:58,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 411.0, 353.0, 438.0, 50.0, 50.0, 47.0, 48.0, 268.0, 50.0]
2025-08-07 09:59:58,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 17 minutes, 1 second)
2025-08-07 10:01:37,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:01:45,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2093.98120 ± 568.191
2025-08-07 10:01:45,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1619.5494, 1880.7985, 1718.0328, 1872.1292, 1416.9857, 2221.8794, 3055.3276, 3116.4958, 1624.486, 2414.1265]
2025-08-07 10:01:45,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [507.0, 585.0, 545.0, 586.0, 444.0, 697.0, 947.0, 1000.0, 514.0, 748.0]
2025-08-07 10:01:45,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2093.98) for latency MM1Queue_a033_s075
2025-08-07 10:01:45,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2025-08-07 10:03:23,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:03:27,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1045.83118 ± 117.508
2025-08-07 10:03:27,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [947.96204, 1246.4061, 1064.2261, 985.6397, 909.9085, 912.27386, 1154.828, 1190.3728, 939.0397, 1107.6542]
2025-08-07 10:03:27,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [310.0, 400.0, 340.0, 323.0, 294.0, 296.0, 367.0, 377.0, 303.0, 349.0]
2025-08-07 10:03:27,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 14 minutes, 51 seconds)
2025-08-07 10:05:11,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:05:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 236.68848 ± 3.891
2025-08-07 10:05:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [238.44594, 229.61372, 240.29436, 238.3528, 240.78499, 233.82237, 236.33543, 237.21375, 241.30946, 230.71184]
2025-08-07 10:05:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 116.0, 121.0, 120.0, 121.0, 119.0, 119.0, 120.0, 122.0, 117.0]
2025-08-07 10:05:12,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 14 minutes)
2025-08-07 10:06:52,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:07:02,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2586.29541 ± 480.767
2025-08-07 10:07:02,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1972.5806, 1955.5304, 3052.1216, 3054.0098, 1925.8545, 3076.1282, 2385.9377, 2487.3914, 2789.3018, 3164.0986]
2025-08-07 10:07:02,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [621.0, 616.0, 1000.0, 1000.0, 605.0, 1000.0, 743.0, 785.0, 891.0, 990.0]
2025-08-07 10:07:02,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2586.30) for latency MM1Queue_a033_s075
2025-08-07 10:07:02,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 14 minutes, 14 seconds)
2025-08-07 10:08:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:39,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1778.17676 ± 646.366
2025-08-07 10:08:39,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1504.4419, 3134.0583, 2258.9788, 2543.696, 1552.2402, 1389.3967, 752.41986, 1494.3763, 1452.0869, 1700.0731]
2025-08-07 10:08:39,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [516.0, 1000.0, 705.0, 786.0, 503.0, 437.0, 287.0, 471.0, 454.0, 547.0]
2025-08-07 10:08:39,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 11 minutes, 59 seconds)
2025-08-07 10:10:19,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:25,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1655.74084 ± 994.822
2025-08-07 10:10:25,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [985.51544, 1210.5416, 3090.3433, 1162.3959, 3075.325, 1517.3528, 1163.2904, 3087.8992, 1144.8682, 119.8779]
2025-08-07 10:10:25,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [318.0, 373.0, 1000.0, 357.0, 1000.0, 466.0, 356.0, 1000.0, 352.0, 74.0]
2025-08-07 10:10:25,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 9 minutes, 52 seconds)
2025-08-07 10:12:06,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:13,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2118.29346 ± 613.237
2025-08-07 10:12:13,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1373.581, 1531.0983, 1046.9077, 2202.4578, 2099.6309, 2857.2659, 2396.7317, 3135.8477, 2292.279, 2247.1357]
2025-08-07 10:12:13,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [462.0, 462.0, 330.0, 676.0, 640.0, 875.0, 734.0, 1000.0, 704.0, 690.0]
2025-08-07 10:12:13,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 9 minutes, 45 seconds)
2025-08-07 10:13:50,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2950.93018 ± 534.503
2025-08-07 10:14:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3138.7742, 3123.1143, 3130.7375, 3132.3347, 3131.86, 3130.298, 1347.5088, 3119.0278, 3122.4705, 3133.173]
2025-08-07 10:14:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 456.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:14:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2950.93) for latency MM1Queue_a033_s075
2025-08-07 10:14:02,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 8 minutes, 52 seconds)
2025-08-07 10:15:39,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2327.39282 ± 985.133
2025-08-07 10:15:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [319.4973, 2369.3684, 1415.1587, 3122.774, 2947.8582, 3040.918, 3032.458, 3022.543, 3031.2478, 972.1027]
2025-08-07 10:15:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 806.0, 417.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 310.0]
2025-08-07 10:15:48,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 6 minutes, 19 seconds)
2025-08-07 10:17:25,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:30,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1360.59131 ± 144.733
2025-08-07 10:17:30,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1343.751, 1301.6564, 1431.4943, 1566.337, 1267.8599, 1211.4556, 1678.278, 1294.2148, 1249.3344, 1261.532]
2025-08-07 10:17:30,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [407.0, 399.0, 435.0, 474.0, 389.0, 377.0, 508.0, 397.0, 380.0, 388.0]
2025-08-07 10:17:30,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 5 minutes, 40 seconds)
2025-08-07 10:19:16,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:28,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2886.53784 ± 457.203
2025-08-07 10:19:28,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3134.2725, 1939.8678, 3147.1562, 3145.8867, 3143.5137, 2895.3088, 3149.9805, 3127.948, 2030.0111, 3151.434]
2025-08-07 10:19:28,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 585.0, 1000.0, 1000.0, 1000.0, 880.0, 1000.0, 1000.0, 625.0, 1000.0]
2025-08-07 10:19:28,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 6 minutes, 38 seconds)
2025-08-07 10:20:58,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2942.08154 ± 30.630
2025-08-07 10:21:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2930.182, 2947.3882, 2974.5024, 2965.1318, 2895.0166, 2889.3672, 2949.5586, 2979.9976, 2968.5781, 2921.093]
2025-08-07 10:21:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:21:10,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 3 minutes, 29 seconds)
2025-08-07 10:22:57,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1600.87732 ± 1064.753
2025-08-07 10:23:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [369.26993, 3077.2285, 381.30252, 2020.1644, 3064.0662, 2020.6316, 378.8396, 374.2727, 2031.6533, 2291.345]
2025-08-07 10:23:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 1000.0, 164.0, 618.0, 964.0, 631.0, 170.0, 168.0, 624.0, 701.0]
2025-08-07 10:23:03,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 2 minutes, 43 seconds)
2025-08-07 10:24:35,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:36,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 89.11207 ± 2.867
2025-08-07 10:24:36,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [87.002365, 88.007324, 88.82877, 93.102135, 93.13328, 85.74762, 85.989044, 92.00125, 85.96066, 91.34818]
2025-08-07 10:24:36,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 59.0, 60.0, 62.0, 62.0, 58.0, 59.0, 61.0, 58.0, 61.0]
2025-08-07 10:24:36,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 57 minutes, 46 seconds)
2025-08-07 10:26:21,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:28,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1810.04041 ± 455.784
2025-08-07 10:26:28,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1396.3773, 1416.2405, 2409.4597, 1942.7537, 1414.2678, 2383.203, 1962.2637, 1129.5604, 2418.1614, 1628.1178]
2025-08-07 10:26:28,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [470.0, 471.0, 780.0, 593.0, 468.0, 775.0, 608.0, 339.0, 782.0, 537.0]
2025-08-07 10:26:28,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 58 minutes, 24 seconds)
2025-08-07 10:28:03,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:11,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1927.98853 ± 973.091
2025-08-07 10:28:11,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2999.8608, 238.48984, 1425.0986, 1140.3539, 3056.8054, 1481.3738, 3073.9775, 3000.704, 1716.2113, 1147.009]
2025-08-07 10:28:11,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 121.0, 462.0, 372.0, 1000.0, 471.0, 1000.0, 1000.0, 525.0, 375.0]
2025-08-07 10:28:11,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 53 minutes, 18 seconds)
2025-08-07 10:29:47,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:51,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1003.79871 ± 1039.921
2025-08-07 10:29:51,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2996.3962, 1240.022, 340.36896, 627.5495, 3035.2659, 340.35065, 338.78278, 421.61758, 359.04242, 338.59143]
2025-08-07 10:29:51,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 385.0, 156.0, 256.0, 1000.0, 154.0, 155.0, 181.0, 160.0, 155.0]
2025-08-07 10:29:51,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 51 minutes, 8 seconds)
2025-08-07 10:31:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:40,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2842.75439 ± 521.522
2025-08-07 10:31:40,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3040.0103, 3199.125, 3145.7854, 3154.8064, 2818.8604, 2763.2852, 2243.3115, 3266.7751, 1554.3628, 3241.22]
2025-08-07 10:31:40,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [931.0, 1000.0, 1000.0, 1000.0, 854.0, 842.0, 685.0, 1000.0, 467.0, 993.0]
2025-08-07 10:31:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 48 minutes, 25 seconds)
2025-08-07 10:33:25,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:32,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1954.63086 ± 788.867
2025-08-07 10:33:32,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1336.6423, 3123.7554, 3110.7588, 1889.6914, 1601.456, 1247.2428, 1628.0134, 1159.6267, 3125.6555, 1323.4675]
2025-08-07 10:33:32,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [392.0, 1000.0, 1000.0, 549.0, 464.0, 375.0, 471.0, 346.0, 1000.0, 387.0]
2025-08-07 10:33:32,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 50 minutes, 48 seconds)
2025-08-07 10:35:08,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:14,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1928.71680 ± 708.601
2025-08-07 10:35:14,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3258.2014, 1415.904, 1420.8965, 1638.5739, 1385.8352, 2424.5718, 1386.1141, 1350.678, 3143.2717, 1863.1215]
2025-08-07 10:35:14,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 424.0, 427.0, 479.0, 413.0, 732.0, 414.0, 403.0, 1000.0, 551.0]
2025-08-07 10:35:14,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 47 minutes, 3 seconds)
2025-08-07 10:36:49,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:52,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 921.03943 ± 9.852
2025-08-07 10:36:52,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [936.82043, 913.2833, 925.0484, 912.66675, 914.8591, 919.8984, 934.5528, 918.3249, 904.8167, 930.12445]
2025-08-07 10:36:52,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [287.0, 280.0, 283.0, 280.0, 280.0, 282.0, 287.0, 281.0, 277.0, 285.0]
2025-08-07 10:36:52,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 44 minutes, 20 seconds)
2025-08-07 10:38:30,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:42,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3012.88818 ± 300.116
2025-08-07 10:38:42,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3147.827, 3164.2341, 3144.1284, 2189.8198, 3149.0437, 3153.4617, 3134.7517, 3133.3862, 3169.188, 2743.041]
2025-08-07 10:38:42,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 663.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 837.0]
2025-08-07 10:38:42,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (3012.89) for latency MM1Queue_a033_s075
2025-08-07 10:38:42,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 44 minutes, 19 seconds)
2025-08-07 10:40:22,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:33,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2895.44922 ± 477.333
2025-08-07 10:40:33,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3217.835, 3161.1401, 1876.5408, 3131.6306, 3122.4785, 3094.2356, 2014.2878, 3134.2734, 3079.606, 3122.4656]
2025-08-07 10:40:33,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 562.0, 1000.0, 1000.0, 1000.0, 649.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:40:33,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 43 minutes, 7 seconds)
2025-08-07 10:42:14,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:25,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2822.05322 ± 507.735
2025-08-07 10:42:25,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3102.198, 2235.271, 3115.2988, 3129.227, 3108.5342, 3116.9036, 3084.5244, 2710.164, 1535.1394, 3083.2727]
2025-08-07 10:42:25,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 683.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 814.0, 464.0, 1000.0]
2025-08-07 10:42:25,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 41 minutes, 20 seconds)
2025-08-07 10:44:02,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:12,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2536.17920 ± 635.700
2025-08-07 10:44:12,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3174.6116, 3186.8977, 1836.599, 1734.8635, 3169.9583, 1550.776, 3155.8542, 2341.0598, 2235.3164, 2975.854]
2025-08-07 10:44:12,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 533.0, 511.0, 1000.0, 486.0, 1000.0, 689.0, 675.0, 896.0]
2025-08-07 10:44:12,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 40 minutes, 18 seconds)
2025-08-07 10:45:54,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:00,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1612.11255 ± 821.748
2025-08-07 10:46:00,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [894.8573, 2425.448, 896.6665, 1792.6549, 2799.4946, 1237.0726, 897.8372, 1161.1669, 900.0824, 3115.8464]
2025-08-07 10:46:00,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [325.0, 794.0, 325.0, 547.0, 847.0, 411.0, 325.0, 406.0, 324.0, 1000.0]
2025-08-07 10:46:00,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 40 minutes, 26 seconds)
2025-08-07 10:47:35,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:38,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 998.01062 ± 195.150
2025-08-07 10:47:38,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1442.4545, 960.95905, 944.5345, 765.2292, 1110.8334, 1226.7893, 927.2473, 872.2702, 822.7061, 907.08325]
2025-08-07 10:47:38,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [435.0, 295.0, 292.0, 248.0, 330.0, 368.0, 289.0, 273.0, 260.0, 284.0]
2025-08-07 10:47:38,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 36 minutes, 35 seconds)
2025-08-07 10:49:15,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:26,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2757.48267 ± 548.592
2025-08-07 10:49:26,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1971.4216, 2252.3115, 2695.329, 3156.936, 3141.5134, 3147.4927, 3168.6907, 3170.8354, 3185.372, 1684.9242]
2025-08-07 10:49:26,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [597.0, 673.0, 797.0, 944.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 506.0]
2025-08-07 10:49:26,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 34 minutes, 7 seconds)
2025-08-07 10:51:03,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3108.36035 ± 28.397
2025-08-07 10:51:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3100.852, 3090.8103, 3099.099, 3089.799, 3171.6328, 3084.6353, 3091.257, 3152.546, 3114.9666, 3088.0063]
2025-08-07 10:51:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:51:16,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (3108.36) for latency MM1Queue_a033_s075
2025-08-07 10:51:16,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 31 minutes, 55 seconds)
2025-08-07 10:52:52,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:54,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 411.61890 ± 77.986
2025-08-07 10:52:54,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [645.1265, 389.15686, 389.79425, 391.2197, 386.4002, 375.9824, 386.1834, 380.0787, 381.2232, 391.02377]
2025-08-07 10:52:54,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [255.0, 164.0, 164.0, 165.0, 163.0, 160.0, 163.0, 161.0, 162.0, 165.0]
2025-08-07 10:52:54,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes, 50 seconds)
2025-08-07 10:54:31,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:39,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2288.45386 ± 553.828
2025-08-07 10:54:39,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1749.2822, 2529.9016, 1621.3611, 2451.3535, 3182.1445, 2042.4922, 1736.9583, 2010.579, 2269.1248, 3291.3418]
2025-08-07 10:54:39,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [525.0, 751.0, 470.0, 731.0, 1000.0, 608.0, 517.0, 598.0, 675.0, 981.0]
2025-08-07 10:54:39,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 26 minutes, 31 seconds)
2025-08-07 10:56:20,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1716.21118 ± 334.501
2025-08-07 10:56:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2628.7292, 1647.5394, 1652.1101, 1686.0428, 1548.1664, 1540.9755, 1354.195, 1729.7633, 1481.5947, 1892.9955]
2025-08-07 10:56:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [780.0, 488.0, 488.0, 498.0, 453.0, 466.0, 402.0, 518.0, 435.0, 561.0]
2025-08-07 10:56:25,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 26 minutes, 6 seconds)
2025-08-07 10:58:04,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:16,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2920.70239 ± 340.223
2025-08-07 10:58:16,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2029.1841, 3136.8335, 2530.8574, 3079.1396, 3042.5095, 3087.981, 3081.7424, 3090.1885, 3054.158, 3074.4294]
2025-08-07 10:58:16,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [664.0, 1000.0, 816.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:58:16,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 24 minutes, 47 seconds)
2025-08-07 10:59:52,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:58,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1600.70740 ± 301.011
2025-08-07 10:59:58,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1723.5271, 1402.3959, 1387.8909, 1698.0864, 1451.3293, 1427.9673, 1387.0723, 2417.7512, 1438.3673, 1672.6869]
2025-08-07 10:59:58,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [516.0, 418.0, 413.0, 498.0, 438.0, 431.0, 418.0, 722.0, 436.0, 492.0]
2025-08-07 10:59:58,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 49 seconds)
2025-08-07 11:01:35,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2974.08154 ± 26.923
2025-08-07 11:01:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3002.3062, 2941.1685, 2935.389, 2960.7234, 2967.5962, 2999.729, 3023.282, 2985.954, 2971.1436, 2953.5227]
2025-08-07 11:01:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:01:48,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 21 minutes, 45 seconds)
2025-08-07 11:03:32,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:32,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 36.24818 ± 1.410
2025-08-07 11:03:32,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [35.558304, 35.61954, 33.75092, 39.257957, 35.816666, 37.223602, 37.173615, 35.68087, 35.363403, 37.036926]
2025-08-07 11:03:32,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 33.0, 32.0, 35.0, 33.0, 34.0, 34.0, 33.0, 33.0, 34.0]
2025-08-07 11:03:32,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 53 seconds)
2025-08-07 11:05:09,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:14,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1395.43604 ± 302.502
2025-08-07 11:05:14,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1331.3633, 1177.1781, 1163.7443, 2217.9768, 1189.0969, 1442.9679, 1529.2683, 1297.8284, 1154.9597, 1449.9767]
2025-08-07 11:05:14,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [394.0, 345.0, 342.0, 656.0, 361.0, 420.0, 454.0, 382.0, 336.0, 426.0]
2025-08-07 11:05:14,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 17 minutes, 26 seconds)
2025-08-07 11:06:55,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:02,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1893.10034 ± 598.091
2025-08-07 11:07:02,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1406.2916, 1464.9769, 2157.068, 1486.0634, 1405.4679, 2633.0396, 1404.5048, 2891.521, 2695.5823, 1386.4893]
2025-08-07 11:07:02,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [417.0, 435.0, 639.0, 448.0, 416.0, 785.0, 416.0, 863.0, 804.0, 411.0]
2025-08-07 11:07:02,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 26 seconds)
2025-08-07 11:08:33,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:37,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 911.09131 ± 31.744
2025-08-07 11:08:37,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [846.4687, 877.0697, 875.9663, 923.4426, 955.38464, 921.9741, 937.5671, 923.5293, 929.6252, 919.88544]
2025-08-07 11:08:37,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 266.0, 266.0, 283.0, 296.0, 280.0, 287.0, 282.0, 282.0, 277.0]
2025-08-07 11:08:37,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 12 minutes, 37 seconds)
2025-08-07 11:10:13,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:18,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1289.45129 ± 522.781
2025-08-07 11:10:18,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1189.4397, 1130.3896, 1214.5593, 1225.9286, 2809.4885, 748.2724, 1155.0677, 1139.1693, 1141.5421, 1140.6549]
2025-08-07 11:10:18,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [348.0, 333.0, 359.0, 364.0, 842.0, 252.0, 339.0, 334.0, 334.0, 336.0]
2025-08-07 11:10:18,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 44 seconds)
2025-08-07 11:11:55,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:06,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2893.02539 ± 481.463
2025-08-07 11:12:06,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1893.0074, 3146.2815, 3216.541, 3181.5762, 2796.1897, 3124.9067, 3191.5332, 2026.0303, 3159.2114, 3194.9766]
2025-08-07 11:12:06,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [552.0, 1000.0, 1000.0, 1000.0, 827.0, 1000.0, 1000.0, 594.0, 1000.0, 1000.0]
2025-08-07 11:12:06,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2025-08-07 11:13:44,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:54,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2487.25024 ± 822.312
2025-08-07 11:13:54,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3051.8315, 3134.407, 3143.919, 2279.9382, 1223.6943, 1214.7402, 3100.413, 3112.5854, 3171.8784, 1439.093]
2025-08-07 11:13:54,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 685.0, 362.0, 369.0, 1000.0, 1000.0, 1000.0, 429.0]
2025-08-07 11:13:54,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 39 seconds)
2025-08-07 11:15:33,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:42,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2471.00854 ± 599.890
2025-08-07 11:15:42,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2650.7927, 3157.0767, 1757.9827, 3207.2742, 3348.703, 1808.4974, 1928.1371, 1974.9987, 2821.09, 2055.5352]
2025-08-07 11:15:42,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [790.0, 1000.0, 529.0, 1000.0, 1000.0, 535.0, 568.0, 586.0, 841.0, 608.0]
2025-08-07 11:15:43,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 55 seconds)
2025-08-07 11:17:19,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:23,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1124.35718 ± 96.197
2025-08-07 11:17:23,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1199.7683, 1203.1204, 1014.3007, 1164.4891, 1022.53235, 1199.7678, 1165.9988, 919.4487, 1205.2721, 1148.872]
2025-08-07 11:17:23,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [353.0, 355.0, 297.0, 338.0, 300.0, 353.0, 339.0, 278.0, 355.0, 333.0]
2025-08-07 11:17:23,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 4 minutes, 56 seconds)
2025-08-07 11:19:01,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:04,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 690.30334 ± 640.488
2025-08-07 11:19:04,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [58.500023, 1505.6687, 51.00634, 60.668846, 51.103527, 1327.08, 1206.0735, 1370.3885, 1221.3701, 51.17389]
2025-08-07 11:19:04,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 441.0, 43.0, 48.0, 43.0, 387.0, 357.0, 397.0, 363.0, 43.0]
2025-08-07 11:19:04,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 3 seconds)
2025-08-07 11:20:41,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:51,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2733.22241 ± 539.828
2025-08-07 11:20:51,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3165.4573, 3208.783, 3133.971, 2399.5564, 3205.109, 2882.5103, 3281.4023, 1748.7555, 2008.3207, 2298.3586]
2025-08-07 11:20:51,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 727.0, 1000.0, 873.0, 994.0, 538.0, 607.0, 700.0]
2025-08-07 11:20:51,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 1 minute, 13 seconds)
2025-08-07 11:22:34,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:40,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1874.69312 ± 576.443
2025-08-07 11:22:40,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1540.1871, 1486.7368, 2415.219, 1603.8411, 3236.6248, 1238.0283, 2240.7488, 1993.1257, 1485.9669, 1506.4539]
2025-08-07 11:22:40,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [451.0, 447.0, 726.0, 473.0, 1000.0, 378.0, 681.0, 611.0, 445.0, 461.0]
2025-08-07 11:22:40,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 59 minutes, 39 seconds)
2025-08-07 11:24:15,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:18,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1132.16602 ± 156.306
2025-08-07 11:24:18,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1107.5259, 1112.2899, 1462.7765, 1327.116, 986.1403, 1166.203, 1084.6235, 948.3492, 945.01013, 1181.6257]
2025-08-07 11:24:18,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [317.0, 319.0, 429.0, 380.0, 293.0, 337.0, 311.0, 286.0, 286.0, 341.0]
2025-08-07 11:24:18,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 45 seconds)
2025-08-07 11:25:59,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:04,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1311.69934 ± 1519.889
2025-08-07 11:26:04,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [70.10241, 72.277695, 3167.1677, 3178.2666, 3173.0762, 69.95362, 69.63102, 3174.181, 70.289856, 72.04728]
2025-08-07 11:26:04,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 52.0, 1000.0, 1000.0, 1000.0, 51.0, 51.0, 1000.0, 51.0, 52.0]
2025-08-07 11:26:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 36 seconds)
2025-08-07 11:27:47,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:53,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1808.50903 ± 694.345
2025-08-07 11:27:53,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1346.8821, 1571.1395, 1166.8097, 1445.9857, 3122.95, 1754.902, 1878.437, 3141.8484, 1457.5775, 1198.5594]
2025-08-07 11:27:53,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [392.0, 455.0, 342.0, 421.0, 1000.0, 517.0, 547.0, 1000.0, 427.0, 354.0]
2025-08-07 11:27:53,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 44 seconds)
2025-08-07 11:29:31,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3112.60864 ± 15.596
2025-08-07 11:29:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3093.2224, 3120.1228, 3108.589, 3124.392, 3120.44, 3085.363, 3134.6877, 3094.7393, 3127.1714, 3117.3596]
2025-08-07 11:29:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:29:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (3112.61) for latency MM1Queue_a033_s075
2025-08-07 11:29:43,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 53 minutes, 12 seconds)
2025-08-07 11:31:19,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:29,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2549.25732 ± 1025.527
2025-08-07 11:31:29,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3062.034, 3052.6846, 3067.5747, 3044.2668, 3047.9148, 3044.4795, 158.17252, 3062.5999, 3061.7793, 891.0657]
2025-08-07 11:31:29,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 88.0, 1000.0, 1000.0, 322.0]
2025-08-07 11:31:29,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 8 seconds)
2025-08-07 11:33:01,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2301.77173 ± 944.030
2025-08-07 11:33:10,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3078.9758, 1219.3156, 3074.9365, 3058.8633, 3041.0774, 929.3578, 1463.0724, 3073.505, 1015.0153, 3063.5977]
2025-08-07 11:33:10,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 380.0, 1000.0, 1000.0, 1000.0, 283.0, 423.0, 1000.0, 304.0, 1000.0]
2025-08-07 11:33:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 35 seconds)
2025-08-07 11:34:45,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:52,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1831.35413 ± 1381.315
2025-08-07 11:34:52,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3140.1477, 46.28514, 1163.5353, 3261.081, 3134.501, 3158.1824, 49.163544, 3069.5479, 1241.5933, 49.50436]
2025-08-07 11:34:52,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 40.0, 336.0, 985.0, 1000.0, 1000.0, 42.0, 917.0, 358.0, 42.0]
2025-08-07 11:34:52,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 27 seconds)
2025-08-07 11:36:28,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:38,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2836.63940 ± 321.487
2025-08-07 11:36:38,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2542.8066, 2532.0144, 3163.8193, 2549.9104, 3164.7864, 2527.443, 2554.7122, 2787.287, 3296.2751, 3247.3408]
2025-08-07 11:36:38,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [766.0, 761.0, 1000.0, 766.0, 960.0, 763.0, 767.0, 849.0, 981.0, 1000.0]
2025-08-07 11:36:38,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 28 seconds)
2025-08-07 11:38:20,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:32,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3130.46753 ± 29.445
2025-08-07 11:38:32,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3105.8882, 3171.5618, 3109.3967, 3109.9055, 3168.9033, 3101.7751, 3162.705, 3162.0469, 3106.24, 3106.253]
2025-08-07 11:38:32,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:38:32,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (3130.47) for latency MM1Queue_a033_s075
2025-08-07 11:38:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 6 seconds)
2025-08-07 11:40:05,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2783.07422 ± 816.603
2025-08-07 11:40:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3280.8535, 1094.4347, 3241.0798, 3162.109, 3170.7961, 3192.2197, 3164.3804, 1210.5887, 3156.6707, 3157.6113]
2025-08-07 11:40:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 319.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 358.0, 1000.0, 1000.0]
2025-08-07 11:40:15,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 4 seconds)
2025-08-07 11:41:55,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:06,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2823.65332 ± 511.241
2025-08-07 11:42:06,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3223.7395, 3161.8936, 1995.4485, 3142.3027, 3155.8599, 2038.6643, 3147.413, 3119.3237, 2099.2795, 3152.6099]
2025-08-07 11:42:06,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [966.0, 1000.0, 612.0, 1000.0, 1000.0, 621.0, 1000.0, 991.0, 636.0, 1000.0]
2025-08-07 11:42:06,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 6 seconds)
2025-08-07 11:43:38,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:45,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1798.46155 ± 1248.609
2025-08-07 11:43:45,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [38.888584, 2559.4883, 3118.2173, 1718.1919, 2554.4282, 40.678375, 2938.2178, 3249.255, 38.694427, 1728.5558]
2025-08-07 11:43:45,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 785.0, 1000.0, 527.0, 776.0, 37.0, 891.0, 998.0, 36.0, 521.0]
2025-08-07 11:43:45,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 7 seconds)
2025-08-07 11:45:23,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2463.93335 ± 585.046
2025-08-07 11:45:32,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3068.6118, 2477.8918, 3213.7734, 2767.1914, 1869.7803, 2430.6194, 3235.7764, 1900.4387, 2256.4402, 1418.8116]
2025-08-07 11:45:32,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [896.0, 733.0, 1000.0, 816.0, 544.0, 714.0, 1000.0, 550.0, 659.0, 409.0]
2025-08-07 11:45:32,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 22 seconds)
2025-08-07 11:47:02,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:05,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1010.67493 ± 50.613
2025-08-07 11:47:05,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [965.4084, 1066.2203, 1107.8998, 1022.9124, 948.389, 1026.5804, 1008.31067, 963.54553, 1044.7644, 952.7174]
2025-08-07 11:47:05,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [296.0, 315.0, 326.0, 306.0, 289.0, 307.0, 303.0, 296.0, 311.0, 293.0]
2025-08-07 11:47:05,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 11 seconds)
2025-08-07 11:48:39,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:48,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2442.34570 ± 870.449
2025-08-07 11:48:48,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2927.7708, 1367.9136, 3166.15, 3177.9688, 3145.8767, 3318.9565, 1419.2454, 1407.9185, 3158.0588, 1333.5974]
2025-08-07 11:48:48,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [867.0, 407.0, 1000.0, 1000.0, 1000.0, 1000.0, 414.0, 414.0, 1000.0, 389.0]
2025-08-07 11:48:48,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 29 seconds)
2025-08-07 11:50:26,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3040.22778 ± 315.947
2025-08-07 11:50:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3161.3308, 3170.9634, 3146.3745, 3155.268, 3159.7595, 2985.1982, 3176.8337, 3163.0725, 3177.1396, 2106.3357]
2025-08-07 11:50:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 909.0, 1000.0, 1000.0, 1000.0, 631.0]
2025-08-07 11:50:37,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 40 seconds)
2025-08-07 11:52:11,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:23,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3062.70679 ± 212.714
2025-08-07 11:52:23,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3160.0579, 3216.68, 3150.503, 3153.6948, 3150.2231, 3149.596, 3154.408, 2490.5977, 3150.1096, 2851.198]
2025-08-07 11:52:23,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 752.0, 1000.0, 868.0]
2025-08-07 11:52:23,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 19 seconds)
2025-08-07 11:54:00,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:03,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1180.25916 ± 36.477
2025-08-07 11:54:03,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1196.7521, 1264.3286, 1167.5298, 1150.9429, 1201.3473, 1189.023, 1180.7571, 1165.4875, 1170.2272, 1116.1962]
2025-08-07 11:54:03,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [353.0, 373.0, 344.0, 340.0, 354.0, 350.0, 349.0, 343.0, 344.0, 330.0]
2025-08-07 11:54:03,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 17 seconds)
2025-08-07 11:55:38,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:46,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2309.22021 ± 604.378
2025-08-07 11:55:46,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3156.7642, 1836.846, 3194.557, 3196.0466, 1461.8441, 1829.5673, 2158.2212, 1990.3044, 2074.9758, 2193.0771]
2025-08-07 11:55:46,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 552.0, 1000.0, 1000.0, 424.0, 551.0, 645.0, 600.0, 610.0, 648.0]
2025-08-07 11:55:46,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 2 seconds)
2025-08-07 11:57:20,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1040.93542 ± 47.702
2025-08-07 11:57:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1086.9413, 1005.80273, 1033.8652, 972.1367, 1005.40204, 999.1254, 1046.9264, 1026.6708, 1120.0104, 1112.4734]
2025-08-07 11:57:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [329.0, 318.0, 320.0, 311.0, 320.0, 317.0, 324.0, 319.0, 336.0, 334.0]
2025-08-07 11:57:24,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 4 seconds)
2025-08-07 11:59:04,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2327.13892 ± 856.008
2025-08-07 11:59:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3194.5215, 3216.6274, 3170.5042, 1485.5096, 3181.8752, 1463.1691, 1380.6888, 3147.8179, 1516.9166, 1513.7595]
2025-08-07 11:59:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 436.0, 1000.0, 430.0, 403.0, 1000.0, 443.0, 444.0]
2025-08-07 11:59:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 19 seconds)
2025-08-07 12:00:45,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:49,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1249.15881 ± 77.827
2025-08-07 12:00:49,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1190.9532, 1231.9052, 1219.4993, 1252.5867, 1405.1395, 1234.3733, 1185.0657, 1187.2109, 1391.8123, 1193.0425]
2025-08-07 12:00:49,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [358.0, 367.0, 364.0, 375.0, 413.0, 369.0, 355.0, 355.0, 406.0, 355.0]
2025-08-07 12:00:49,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 15 seconds)
2025-08-07 12:02:27,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:30,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1098.52173 ± 53.281
2025-08-07 12:02:30,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1088.0238, 1000.0732, 1060.4182, 1100.118, 1149.2498, 1052.9406, 1150.5542, 1120.9716, 1191.0292, 1071.8375]
2025-08-07 12:02:30,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [321.0, 304.0, 315.0, 324.0, 338.0, 313.0, 338.0, 329.0, 351.0, 318.0]
2025-08-07 12:02:31,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 35 seconds)
2025-08-07 12:04:06,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:16,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2642.46118 ± 694.038
2025-08-07 12:04:16,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2018.2233, 3174.5107, 2043.3672, 3177.0767, 3182.2263, 1205.4246, 2100.361, 3175.9097, 3171.729, 3175.7825]
2025-08-07 12:04:16,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [605.0, 1000.0, 610.0, 1000.0, 1000.0, 360.0, 635.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:04:16,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes)
2025-08-07 12:05:52,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:55,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1132.79553 ± 52.397
2025-08-07 12:05:55,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1187.1327, 1230.9775, 1136.122, 1143.2816, 1103.0536, 1127.3909, 1032.6409, 1158.1929, 1076.6586, 1132.505]
2025-08-07 12:05:55,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [351.0, 367.0, 336.0, 337.0, 328.0, 334.0, 312.0, 342.0, 322.0, 334.0]
2025-08-07 12:05:55,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 20 seconds)
2025-08-07 12:07:35,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:38,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1136.25110 ± 52.863
2025-08-07 12:07:38,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1129.7222, 1192.8536, 1229.734, 1171.0889, 1153.9945, 1096.8746, 1166.0103, 1085.6128, 1078.2816, 1058.3385]
2025-08-07 12:07:38,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [336.0, 355.0, 370.0, 348.0, 343.0, 326.0, 346.0, 323.0, 321.0, 317.0]
2025-08-07 12:07:38,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 29 seconds)
2025-08-07 12:09:08,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:11,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1011.50403 ± 57.074
2025-08-07 12:09:11,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [997.5057, 998.6882, 918.75287, 1106.7161, 1117.3148, 999.49976, 1005.10144, 984.62964, 963.5105, 1023.3213]
2025-08-07 12:09:11,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [303.0, 304.0, 285.0, 331.0, 334.0, 304.0, 306.0, 301.0, 297.0, 310.0]
2025-08-07 12:09:11,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 42 seconds)
2025-08-07 12:10:45,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:51,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1560.26038 ± 289.615
2025-08-07 12:10:51,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1720.8159, 1980.4128, 1449.2926, 1218.6682, 1205.9893, 1451.6766, 1686.1484, 1726.4866, 1177.9619, 1985.1519]
2025-08-07 12:10:51,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [540.0, 619.0, 463.0, 388.0, 376.0, 465.0, 512.0, 521.0, 346.0, 599.0]
2025-08-07 12:10:51,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes)
2025-08-07 12:12:33,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:37,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1365.34412 ± 754.867
2025-08-07 12:12:37,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [51.26731, 2883.949, 1040.2958, 1126.7106, 1264.1918, 2406.8772, 1688.9036, 1184.4099, 939.17596, 1067.6604]
2025-08-07 12:12:37,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 857.0, 315.0, 340.0, 379.0, 718.0, 514.0, 358.0, 296.0, 321.0]
2025-08-07 12:12:37,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 21 seconds)
2025-08-07 12:14:06,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:15,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2476.09033 ± 850.514
2025-08-07 12:14:15,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3139.5376, 3136.8276, 1202.1572, 1022.7021, 1700.4288, 1986.468, 3145.9932, 3152.078, 3130.0742, 3144.6372]
2025-08-07 12:14:15,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 368.0, 311.0, 504.0, 582.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:14:15,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 39 seconds)
2025-08-07 12:15:52,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:00,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2256.25415 ± 945.243
2025-08-07 12:16:00,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1098.869, 3169.7083, 1120.9055, 3159.8743, 1232.3905, 2080.8184, 3138.6028, 3165.1199, 1206.927, 3189.3247]
2025-08-07 12:16:00,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [329.0, 1000.0, 337.0, 1000.0, 368.0, 637.0, 1000.0, 1000.0, 359.0, 1000.0]
2025-08-07 12:16:00,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes)
2025-08-07 12:17:36,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1145.10498 ± 107.653
2025-08-07 12:17:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1214.6511, 1089.7098, 1239.1841, 1204.4763, 1178.415, 1120.8118, 1219.2355, 870.21466, 1077.9327, 1236.419]
2025-08-07 12:17:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [353.0, 322.0, 362.0, 350.0, 344.0, 346.0, 355.0, 286.0, 320.0, 360.0]
2025-08-07 12:17:40,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 23 seconds)
2025-08-07 12:19:16,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:19,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 896.09863 ± 14.840
2025-08-07 12:19:19,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [916.391, 896.43677, 889.97424, 887.01556, 892.18805, 911.7309, 897.232, 869.25806, 918.4379, 882.3215]
2025-08-07 12:19:19,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [278.0, 272.0, 270.0, 269.0, 269.0, 278.0, 272.0, 263.0, 279.0, 268.0]
2025-08-07 12:19:19,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2025-08-07 12:20:52,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:56,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1150.85254 ± 69.881
2025-08-07 12:20:56,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1184.0764, 1097.9279, 1230.2222, 1059.3132, 1174.2131, 1232.7363, 1135.8921, 1176.8011, 1012.38245, 1204.9597]
2025-08-07 12:20:56,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [350.0, 333.0, 364.0, 332.0, 348.0, 366.0, 341.0, 349.0, 321.0, 361.0]
2025-08-07 12:20:56,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1251 [DEBUG]: Training session finished
