2025-08-07 07:25:07,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:25:07,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:25:07,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14658fc2f990>}
2025-08-07 07:25:07,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 07:25:07,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 07:25:07,481 baseline-bpql-noiseperc25-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:25:07,481 baseline-bpql-noiseperc25-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:25:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 07:25:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 07:26:37,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.36333 ± 4.822
2025-08-07 07:26:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.9165126, 10.90045, 1.033524, -2.3002322, 12.356461, 6.262976, 4.114619, 10.221044, 3.8249767, 8.135999]
2025-08-07 07:26:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 36.0, 21.0, 35.0, 37.0, 36.0, 17.0, 35.0, 17.0, 20.0]
2025-08-07 07:26:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (5.36) for latency ExtremeClogL1U23
2025-08-07 07:26:37,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 25 minutes, 3 seconds)
2025-08-07 07:28:12,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:13,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: -1.94959 ± 27.311
2025-08-07 07:28:13,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.9072683, -51.164986, 26.906298, 4.9144397, -10.741658, 45.04001, -0.6309726, -1.1698213, -43.588818, 9.032381]
2025-08-07 07:28:13,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 91.0, 92.0, 21.0, 73.0, 105.0, 25.0, 152.0, 165.0, 22.0]
2025-08-07 07:28:13,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 30 minutes, 1 second)
2025-08-07 07:29:48,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:48,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 21.22953 ± 59.175
2025-08-07 07:29:48,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [5.743501, 3.38443, 4.118391, -2.0696268, 180.73976, -2.0451903, 65.99081, -48.95302, -0.5450076, 5.9312625]
2025-08-07 07:29:48,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 19.0, 18.0, 9.0, 169.0, 18.0, 65.0, 163.0, 24.0, 30.0]
2025-08-07 07:29:48,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (21.23) for latency ExtremeClogL1U23
2025-08-07 07:29:48,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2025-08-07 07:31:24,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:25,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 31.01625 ± 62.075
2025-08-07 07:31:25,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [14.544851, 68.954414, 207.51207, 2.6273763, -0.7772074, 3.6664371, 6.747478, 5.9328594, 3.8659663, -2.9118185]
2025-08-07 07:31:25,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 76.0, 157.0, 19.0, 9.0, 14.0, 35.0, 16.0, 70.0, 14.0]
2025-08-07 07:31:25,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (31.02) for latency ExtremeClogL1U23
2025-08-07 07:31:25,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 30 minutes, 14 seconds)
2025-08-07 07:33:00,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.98634 ± 11.235
2025-08-07 07:33:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.33763668, 12.015696, 2.9365594, 2.2944736, 0.15953135, 34.371128, 2.364233, 2.8115861, 18.898655, -6.3260994]
2025-08-07 07:33:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 36.0, 18.0, 33.0, 15.0, 75.0, 31.0, 13.0, 29.0, 149.0]
2025-08-07 07:33:01,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 29 minutes, 20 seconds)
2025-08-07 07:34:35,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:36,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 17.04781 ± 26.410
2025-08-07 07:34:36,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [8.560119, 3.3232846, 20.449295, -4.586847, 11.368112, 5.894339, 22.08281, 8.219781, 92.91594, 2.2512681]
2025-08-07 07:34:36,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 141.0, 23.0, 28.0, 21.0, 83.0, 39.0, 121.0, 11.0]
2025-08-07 07:34:36,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 30 minutes, 8 seconds)
2025-08-07 07:36:11,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:11,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.59528 ± 7.963
2025-08-07 07:36:11,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.093166, 0.8730772, 27.616148, 1.6228814, 3.0539, 4.0778995, 14.36631, 3.7765963, 1.5807443, 1.8920699]
2025-08-07 07:36:11,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 113.0, 16.0, 16.0, 15.0, 42.0, 23.0, 22.0, 16.0]
2025-08-07 07:36:11,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 28 minutes, 23 seconds)
2025-08-07 07:37:46,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:47,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 9.12169 ± 7.400
2025-08-07 07:37:47,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.4964514, 3.296717, 5.2726326, 13.2571945, 15.687616, 0.58375716, 5.1078367, 11.273261, 3.5857468, 26.655714]
2025-08-07 07:37:47,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 14.0, 17.0, 48.0, 26.0, 36.0, 16.0, 43.0, 14.0, 54.0]
2025-08-07 07:37:47,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 26 minutes, 42 seconds)
2025-08-07 07:39:22,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:22,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 17.57326 ± 24.079
2025-08-07 07:39:22,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [36.117226, -6.6640463, 3.3796341, -1.8429332, 56.8263, 0.32392275, -3.1612854, -1.7134959, 47.81514, 44.65212]
2025-08-07 07:39:22,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [47.0, 22.0, 17.0, 14.0, 74.0, 22.0, 25.0, 20.0, 92.0, 122.0]
2025-08-07 07:39:22,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 24 minutes, 51 seconds)
2025-08-07 07:40:58,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:58,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.47724 ± 7.657
2025-08-07 07:40:58,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.6594625, 2.1200442, 8.642358, 3.542527, 5.996839, -5.5664907, 5.7404428, 1.5878205, 3.1925893, 25.856798]
2025-08-07 07:40:58,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 20.0, 17.0, 17.0, 24.0, 19.0, 16.0, 15.0, 57.0]
2025-08-07 07:40:58,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 23 minutes, 11 seconds)
2025-08-07 07:42:33,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:34,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 27.16168 ± 61.501
2025-08-07 07:42:34,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [62.68316, 0.45370483, 10.292014, -16.218613, 1.9478964, -5.000306, 10.683096, 1.3028685, 3.6187758, 201.8542]
2025-08-07 07:42:34,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [70.0, 14.0, 161.0, 93.0, 16.0, 16.0, 19.0, 13.0, 16.0, 158.0]
2025-08-07 07:42:34,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 21 minutes, 42 seconds)
2025-08-07 07:44:09,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:09,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 28.83477 ± 47.172
2025-08-07 07:44:09,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.9136814, 63.33586, -3.4155, 154.82036, 7.0978646, 10.356787, 5.784075, 0.25689107, 0.39213058, 48.8055]
2025-08-07 07:44:09,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 111.0, 31.0, 120.0, 15.0, 47.0, 33.0, 13.0, 15.0, 55.0]
2025-08-07 07:44:09,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 20 minutes, 14 seconds)
2025-08-07 07:45:44,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:45,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 34.94521 ± 49.975
2025-08-07 07:45:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.09687774, 6.0708127, 77.15882, 4.614976, 1.818679, 4.5347056, 5.6534295, 140.86731, 2.4779744, 106.15852]
2025-08-07 07:45:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [11.0, 17.0, 79.0, 19.0, 17.0, 23.0, 17.0, 114.0, 14.0, 94.0]
2025-08-07 07:45:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (34.95) for latency ExtremeClogL1U23
2025-08-07 07:45:45,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 18 minutes, 43 seconds)
2025-08-07 07:47:20,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.21636 ± 10.116
2025-08-07 07:47:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.4395568, 22.914537, 2.8079584, 29.822603, -2.8071914, 8.767517, 3.5869813, 1.0944322, 3.265776, -0.7286192]
2025-08-07 07:47:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 50.0, 13.0, 52.0, 12.0, 46.0, 18.0, 17.0, 19.0, 19.0]
2025-08-07 07:47:20,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 17 minutes, 4 seconds)
2025-08-07 07:48:55,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:56,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 12.42120 ± 13.986
2025-08-07 07:48:56,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [15.071535, 1.6952552, -1.6192105, 4.1997643, 34.438625, 3.739333, 21.66042, 5.1389327, 39.406895, 0.4804039]
2025-08-07 07:48:56,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [42.0, 17.0, 15.0, 18.0, 80.0, 13.0, 53.0, 28.0, 53.0, 13.0]
2025-08-07 07:48:56,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 15 minutes, 23 seconds)
2025-08-07 07:50:31,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:31,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 15.14952 ± 14.401
2025-08-07 07:50:31,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.9100256, 6.332429, 5.603185, 3.1840675, 1.0397713, 25.210625, 15.836937, 37.522617, 6.199272, 43.656216]
2025-08-07 07:50:31,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 23.0, 24.0, 23.0, 14.0, 50.0, 25.0, 63.0, 34.0, 67.0]
2025-08-07 07:50:31,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 13 minutes, 40 seconds)
2025-08-07 07:52:07,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:08,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 30.30164 ± 53.519
2025-08-07 07:52:08,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.5631007, 96.16549, 3.1591458, 166.115, 18.078932, -4.758867, 0.39459783, 5.6746564, 23.324507, -6.7002]
2025-08-07 07:52:08,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 117.0, 17.0, 123.0, 38.0, 22.0, 15.0, 33.0, 39.0, 12.0]
2025-08-07 07:52:08,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 12 minutes, 23 seconds)
2025-08-07 07:53:42,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:43,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 14.87893 ± 22.631
2025-08-07 07:53:43,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [78.26115, 13.833612, 15.665115, 25.639284, 4.6447086, 0.10825904, 0.40693623, -1.8633072, 9.052683, 3.0408478]
2025-08-07 07:53:43,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [103.0, 44.0, 52.0, 43.0, 20.0, 17.0, 22.0, 9.0, 24.0, 17.0]
2025-08-07 07:53:43,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 10 minutes, 32 seconds)
2025-08-07 07:55:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:19,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 41.40338 ± 52.138
2025-08-07 07:55:19,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [125.47171, 84.463196, 11.024424, 144.06644, 5.9793797, 6.0182195, 10.778681, -1.2413435, 12.119604, 15.353504]
2025-08-07 07:55:19,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 83.0, 28.0, 119.0, 23.0, 16.0, 36.0, 13.0, 25.0, 41.0]
2025-08-07 07:55:19,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (41.40) for latency ExtremeClogL1U23
2025-08-07 07:55:19,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 9 minutes, 6 seconds)
2025-08-07 07:56:54,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:55,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 48.72609 ± 85.150
2025-08-07 07:56:55,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-7.691904, 46.42876, 215.45232, 0.6978004, 217.9745, -2.2156942, 8.098367, 2.041189, 1.7610866, 4.714485]
2025-08-07 07:56:55,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 61.0, 150.0, 24.0, 135.0, 28.0, 24.0, 13.0, 15.0, 18.0]
2025-08-07 07:56:55,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (48.73) for latency ExtremeClogL1U23
2025-08-07 07:56:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 7 minutes, 45 seconds)
2025-08-07 07:58:31,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:58:31,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 12.39977 ± 20.378
2025-08-07 07:58:31,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.9660573, -3.412153, 2.7826495, 4.101691, 7.848938, 3.3804615, 11.936341, 69.70424, 1.7531645, 23.936283]
2025-08-07 07:58:31,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 17.0, 13.0, 16.0, 20.0, 15.0, 25.0, 63.0, 14.0, 49.0]
2025-08-07 07:58:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 6 minutes, 21 seconds)
2025-08-07 08:00:06,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:00:07,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 20.57819 ± 53.728
2025-08-07 08:00:07,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.24551, 1.9333005, -1.5223764, 181.01472, 6.776642, -4.077612, -0.9820762, 5.0590773, 11.010515, -3.6758134]
2025-08-07 08:00:07,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 13.0, 13.0, 175.0, 17.0, 22.0, 22.0, 15.0, 22.0, 18.0]
2025-08-07 08:00:07,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 4 minutes, 28 seconds)
2025-08-07 08:01:42,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:43,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 23.07326 ± 38.482
2025-08-07 08:01:43,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.25436684, -1.095098, 64.50213, 6.6629224, 0.49730146, 9.02705, 5.753937, 122.96045, -3.4117546, 26.089994]
2025-08-07 08:01:43,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 14.0, 86.0, 19.0, 18.0, 22.0, 20.0, 86.0, 11.0, 85.0]
2025-08-07 08:01:43,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 3 minutes, 12 seconds)
2025-08-07 08:03:18,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:19,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 77.34997 ± 96.586
2025-08-07 08:03:19,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.157221, 1.5075576, -1.7180531, 283.8443, 23.795197, 149.87468, 3.7411778, 211.10052, 56.494698, 41.702404]
2025-08-07 08:03:19,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 30.0, 183.0, 44.0, 155.0, 21.0, 209.0, 94.0, 72.0]
2025-08-07 08:03:19,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (77.35) for latency ExtremeClogL1U23
2025-08-07 08:03:19,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 1 minute, 45 seconds)
2025-08-07 08:04:55,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:55,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 21.84649 ± 28.267
2025-08-07 08:04:55,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [83.97833, 5.258104, 4.53526, 14.4023695, 63.633583, -5.666567, 7.671661, 12.74166, -1.9816949, 33.892124]
2025-08-07 08:04:55,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [75.0, 24.0, 15.0, 75.0, 63.0, 20.0, 20.0, 49.0, 14.0, 63.0]
2025-08-07 08:04:55,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 8 seconds)
2025-08-07 08:06:31,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:31,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 10.80172 ± 22.181
2025-08-07 08:06:31,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-4.595101, 3.320128, 2.6882362, 3.703685, 2.5186017, 16.332302, -6.4439034, -1.7975205, 19.154121, 73.13669]
2025-08-07 08:06:31,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 12.0, 19.0, 13.0, 32.0, 25.0, 11.0, 55.0, 85.0]
2025-08-07 08:06:31,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 58 minutes, 22 seconds)
2025-08-07 08:08:07,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 11.66489 ± 22.218
2025-08-07 08:08:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [19.428247, 1.0670207, 1.9738111, 16.641932, -0.8128482, -1.0567409, -2.0122063, 2.5715554, 3.9523213, 74.89586]
2025-08-07 08:08:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [55.0, 15.0, 20.0, 58.0, 15.0, 22.0, 24.0, 15.0, 16.0, 114.0]
2025-08-07 08:08:08,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 57 minutes, 4 seconds)
2025-08-07 08:09:42,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:43,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 62.06786 ± 98.119
2025-08-07 08:09:43,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [121.70551, 8.83671, -0.63474935, 9.817684, 311.0176, 151.36447, 7.834362, 6.2280784, 7.601443, -3.0925097]
2025-08-07 08:09:43,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 19.0, 23.0, 25.0, 176.0, 91.0, 19.0, 20.0, 25.0, 14.0]
2025-08-07 08:09:43,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 55 minutes, 17 seconds)
2025-08-07 08:11:19,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:20,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 15.71151 ± 31.871
2025-08-07 08:11:20,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.581995, 12.473919, 2.512832, 110.618675, 3.817923, 7.693642, 3.8679948, 1.3502411, 4.983358, -0.7854622]
2025-08-07 08:11:20,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 50.0, 25.0, 140.0, 20.0, 32.0, 23.0, 24.0, 25.0, 12.0]
2025-08-07 08:11:20,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 53 minutes, 43 seconds)
2025-08-07 08:12:56,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:56,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 8.33966 ± 15.185
2025-08-07 08:12:56,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.7554078, 3.224224, 4.8101754, 14.508012, 51.557415, 5.253661, -1.4546092, -1.6405704, 7.2133527, 2.6803713]
2025-08-07 08:12:56,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 15.0, 38.0, 59.0, 20.0, 24.0, 24.0, 24.0, 17.0]
2025-08-07 08:12:56,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 52 minutes, 7 seconds)
2025-08-07 08:14:32,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:33,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 26.46691 ± 35.699
2025-08-07 08:14:33,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.4339046, 37.551132, -6.3840537, 8.30229, 19.68616, 30.16578, 1.6094589, 54.49598, 3.6072845, 118.06897]
2025-08-07 08:14:33,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 57.0, 20.0, 24.0, 52.0, 59.0, 13.0, 99.0, 15.0, 102.0]
2025-08-07 08:14:33,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 50 minutes, 53 seconds)
2025-08-07 08:16:10,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:11,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 55.04320 ± 107.300
2025-08-07 08:16:11,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [269.85312, 5.8382297, 3.4716537, -2.1636045, -0.012353362, 269.16803, 3.9630158, 7.4933205, -1.2525945, -5.926751]
2025-08-07 08:16:11,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 25.0, 26.0, 11.0, 16.0, 146.0, 15.0, 18.0, 18.0, 19.0]
2025-08-07 08:16:11,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 49 minutes, 26 seconds)
2025-08-07 08:17:47,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:47,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 39.56115 ± 105.593
2025-08-07 08:17:47,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.7966673, 15.610292, 356.00018, 5.805205, -1.6177913, -0.12911105, 9.350578, 4.479868, 0.36592618, 4.9496684]
2025-08-07 08:17:47,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [11.0, 44.0, 194.0, 17.0, 16.0, 17.0, 25.0, 19.0, 24.0, 21.0]
2025-08-07 08:17:47,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 48 minutes, 10 seconds)
2025-08-07 08:19:23,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:24,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 39.84601 ± 57.934
2025-08-07 08:19:24,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [14.936569, 15.479751, 3.2802022, 97.27959, 36.906715, -0.38279667, 0.8911712, 190.91228, 41.10036, -1.9437392]
2025-08-07 08:19:24,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [39.0, 129.0, 22.0, 116.0, 111.0, 19.0, 21.0, 126.0, 109.0, 16.0]
2025-08-07 08:19:24,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 46 minutes, 36 seconds)
2025-08-07 08:21:01,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:01,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 49.15152 ± 85.654
2025-08-07 08:21:01,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.5033813, 10.483126, 0.94633144, 1.8556913, 8.07645, 20.356718, 7.858937, 195.10982, 242.46834, 1.8563837]
2025-08-07 08:21:01,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 20.0, 21.0, 14.0, 39.0, 43.0, 18.0, 124.0, 122.0, 15.0]
2025-08-07 08:21:01,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 12 seconds)
2025-08-07 08:22:38,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 84.66551 ± 121.302
2025-08-07 08:22:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.4670532, 288.93912, 206.49792, 13.561474, 302.01328, 2.9047918, -2.0269125, 1.563977, 35.610477, -3.876017]
2025-08-07 08:22:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 197.0, 233.0, 24.0, 150.0, 22.0, 11.0, 15.0, 58.0, 25.0]
2025-08-07 08:22:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (84.67) for latency ExtremeClogL1U23
2025-08-07 08:22:39,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 34 seconds)
2025-08-07 08:24:15,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:16,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 92.61275 ± 139.143
2025-08-07 08:24:16,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.3162181, 324.53717, 1.1943525, 347.1472, 0.26692542, 2.263165, 6.189962, 9.276631, 1.845594, 231.09024]
2025-08-07 08:24:16,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 175.0, 14.0, 178.0, 11.0, 14.0, 23.0, 20.0, 17.0, 155.0]
2025-08-07 08:24:16,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (92.61) for latency ExtremeClogL1U23
2025-08-07 08:24:16,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 41 minutes, 54 seconds)
2025-08-07 08:25:52,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:53,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 41.11316 ± 101.937
2025-08-07 08:25:53,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.7940874, 4.8373017, 8.921562, 21.992094, 346.51746, 6.060545, 7.4953394, 2.7628627, 3.961102, 4.7892485]
2025-08-07 08:25:53,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 17.0, 20.0, 38.0, 198.0, 16.0, 69.0, 20.0, 15.0, 17.0]
2025-08-07 08:25:53,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 21 seconds)
2025-08-07 08:27:29,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 32.86480 ± 84.341
2025-08-07 08:27:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.991895, -2.0913856, 3.0568562, 7.354977, 3.377589, 2.4748776, -1.7092216, 15.9222145, 3.8620234, 285.40817]
2025-08-07 08:27:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 14.0, 21.0, 23.0, 15.0, 15.0, 46.0, 17.0, 133.0]
2025-08-07 08:27:30,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 40 seconds)
2025-08-07 08:29:07,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:08,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 100.80482 ± 143.791
2025-08-07 08:29:08,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.5911808, 291.76788, -0.31819808, 318.92, 347.1179, 30.56408, 4.424371, -0.08866106, 6.9315705, 6.1379924]
2025-08-07 08:29:08,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 261.0, 12.0, 225.0, 210.0, 55.0, 26.0, 21.0, 19.0, 18.0]
2025-08-07 08:29:08,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (100.80) for latency ExtremeClogL1U23
2025-08-07 08:29:08,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 20 seconds)
2025-08-07 08:30:44,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 87.45079 ± 155.275
2025-08-07 08:30:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.5952152, 4.3588753, 5.5183105, 101.19129, -3.9439049, 3.1041648, 475.04977, 285.5801, 5.9055543, -0.6611116]
2025-08-07 08:30:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 16.0, 18.0, 139.0, 20.0, 21.0, 260.0, 163.0, 17.0, 17.0]
2025-08-07 08:30:45,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 35 minutes, 34 seconds)
2025-08-07 08:32:20,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:21,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 112.61208 ± 218.153
2025-08-07 08:32:21,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [615.12115, 473.29764, 1.6700834, 1.0466471, 1.5853753, 3.2425666, 2.19749, 3.6795547, 14.210268, 10.070122]
2025-08-07 08:32:21,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [335.0, 247.0, 21.0, 15.0, 17.0, 19.0, 22.0, 23.0, 35.0, 23.0]
2025-08-07 08:32:21,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (112.61) for latency ExtremeClogL1U23
2025-08-07 08:32:21,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 33 minutes, 50 seconds)
2025-08-07 08:33:59,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:59,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 18.75782 ± 47.988
2025-08-07 08:33:59,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.4635936, 3.9495242, 3.2513993, 3.5048769, 4.1123548, -1.2142081, 0.6229153, 162.56831, 7.505198, 1.814268]
2025-08-07 08:33:59,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 16.0, 21.0, 18.0, 14.0, 33.0, 16.0, 129.0, 24.0, 12.0]
2025-08-07 08:33:59,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2025-08-07 08:35:36,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:37,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 27.63252 ± 58.390
2025-08-07 08:35:37,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.6722214, 0.44519448, 2.4997406, 5.348548, 44.768612, 2.2829115, 197.69427, -2.3759518, 24.252157, -1.2625242]
2025-08-07 08:35:37,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 19.0, 14.0, 22.0, 90.0, 21.0, 110.0, 12.0, 58.0, 12.0]
2025-08-07 08:35:37,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 57 seconds)
2025-08-07 08:37:13,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:15,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 157.06645 ± 168.559
2025-08-07 08:37:15,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [29.624096, 3.0797234, 423.355, 205.0175, -2.3328862, 139.444, 405.68512, 352.90735, 7.4324565, 6.452195]
2025-08-07 08:37:15,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [58.0, 19.0, 329.0, 120.0, 14.0, 89.0, 258.0, 218.0, 20.0, 18.0]
2025-08-07 08:37:15,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (157.07) for latency ExtremeClogL1U23
2025-08-07 08:37:15,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 9 seconds)
2025-08-07 08:38:51,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:52,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 39.46072 ± 96.425
2025-08-07 08:38:52,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [19.805641, 1.6394256, 3.1942413, 0.4247863, 16.365099, 328.08234, 12.518762, 8.310437, 0.83518434, 3.431347]
2025-08-07 08:38:52,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [42.0, 15.0, 14.0, 23.0, 26.0, 191.0, 20.0, 30.0, 17.0, 20.0]
2025-08-07 08:38:52,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 40 seconds)
2025-08-07 08:40:28,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:29,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 107.26701 ± 159.638
2025-08-07 08:40:29,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-3.6274433, 3.9001796, 2.818712, 10.577156, 411.0664, 8.140269, 15.496783, 0.55692166, 366.4028, 257.3384]
2025-08-07 08:40:29,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 25.0, 13.0, 37.0, 198.0, 22.0, 48.0, 14.0, 171.0, 133.0]
2025-08-07 08:40:29,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 5 seconds)
2025-08-07 08:42:04,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:05,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 44.06965 ± 113.975
2025-08-07 08:42:05,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.3570201, 36.947666, 1.5244856, 384.54037, -2.0818849, 3.0377488, 6.231822, 2.7925804, 4.3617506, 1.9849766]
2025-08-07 08:42:05,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 56.0, 14.0, 165.0, 10.0, 15.0, 17.0, 16.0, 23.0, 21.0]
2025-08-07 08:42:05,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 10 seconds)
2025-08-07 08:43:43,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:45,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 147.07870 ± 220.619
2025-08-07 08:43:45,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [5.270901, 622.6853, 4.932408, 3.3391654, 67.80472, -0.2981461, 288.92874, -1.1327738, 2.6402314, 476.61664]
2025-08-07 08:43:45,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 373.0, 23.0, 19.0, 119.0, 10.0, 155.0, 23.0, 21.0, 281.0]
2025-08-07 08:43:45,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 22 minutes, 55 seconds)
2025-08-07 08:45:21,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 86.83182 ± 145.324
2025-08-07 08:45:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.7649558, 2.5618608, 8.284646, -0.52239597, 207.3248, 184.43925, 6.4209666, 5.029777, 457.64258, -0.098312005]
2025-08-07 08:45:22,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 20.0, 18.0, 11.0, 150.0, 115.0, 22.0, 17.0, 224.0, 34.0]
2025-08-07 08:45:22,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 12 seconds)
2025-08-07 08:46:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:59,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 141.02493 ± 121.749
2025-08-07 08:46:59,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.5092252, 6.914599, 189.51982, 169.89543, 1.4585651, 27.055742, 309.0944, 179.74986, 173.4918, 351.55984]
2025-08-07 08:46:59,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 16.0, 117.0, 127.0, 13.0, 41.0, 174.0, 107.0, 88.0, 195.0]
2025-08-07 08:46:59,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 34 seconds)
2025-08-07 08:48:35,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:36,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 63.34562 ± 121.509
2025-08-07 08:48:36,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [12.741972, 5.458977, -1.814506, 0.24597533, 338.0208, -2.3644552, 6.3420362, 3.9400501, 270.65982, 0.22546303]
2025-08-07 08:48:36,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 9.0, 15.0, 168.0, 10.0, 36.0, 18.0, 143.0, 17.0]
2025-08-07 08:48:36,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 17 minutes, 57 seconds)
2025-08-07 08:50:12,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:13,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 75.07304 ± 148.716
2025-08-07 08:50:13,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.3737705, 1.309413, 5.6236267, 2.1710765, 458.96454, 20.971245, 0.65667534, 0.44632733, 0.5509404, 257.6628]
2025-08-07 08:50:13,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 20.0, 12.0, 248.0, 86.0, 15.0, 14.0, 13.0, 145.0]
2025-08-07 08:50:13,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 16 minutes, 29 seconds)
2025-08-07 08:51:50,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:51,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 156.75056 ± 200.224
2025-08-07 08:51:51,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-5.6653843, 9.046299, 8.703484, 304.77356, 420.487, -0.34752545, 276.68896, 4.207083, 547.85, 1.7621521]
2025-08-07 08:51:51,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 36.0, 233.0, 233.0, 12.0, 165.0, 18.0, 343.0, 14.0]
2025-08-07 08:51:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 14 minutes, 36 seconds)
2025-08-07 08:53:31,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 46.43245 ± 82.170
2025-08-07 08:53:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.633231, 215.11667, 29.535435, 2.8313575, -1.8758746, -1.0064982, 3.9056103, 204.58952, 5.4863124, 3.1087937]
2025-08-07 08:53:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 181.0, 51.0, 13.0, 10.0, 12.0, 19.0, 133.0, 57.0, 16.0]
2025-08-07 08:53:31,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 13 minutes, 24 seconds)
2025-08-07 08:55:06,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:07,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 153.29440 ± 208.941
2025-08-07 08:55:07,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.0864134, 111.18487, 0.995746, 391.0738, 7.821516, 482.31833, 522.8795, 4.73471, -2.6540217, 10.503187]
2025-08-07 08:55:07,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 224.0, 15.0, 170.0, 22.0, 251.0, 296.0, 18.0, 12.0, 26.0]
2025-08-07 08:55:07,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 39 seconds)
2025-08-07 08:56:44,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 44.42065 ± 126.250
2025-08-07 08:56:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.3154771, 422.99106, 1.6348848, 0.16295329, 2.0036917, 6.0063934, 9.859876, 2.9415338, 2.5951352, -6.304498]
2025-08-07 08:56:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 179.0, 18.0, 25.0, 18.0, 17.0, 22.0, 13.0, 13.0, 30.0]
2025-08-07 08:56:45,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 4 seconds)
2025-08-07 08:58:22,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 136.57889 ± 229.283
2025-08-07 08:58:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [254.69244, 8.109811, 7.1348224, -0.601767, 281.9294, 68.68339, -4.333498, 0.3200776, 750.3589, -0.50467503]
2025-08-07 08:58:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 24.0, 33.0, 22.0, 141.0, 104.0, 20.0, 16.0, 398.0, 12.0]
2025-08-07 08:58:23,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 36 seconds)
2025-08-07 08:59:59,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:01,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 188.49464 ± 215.825
2025-08-07 09:00:01,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-3.8687139, 1.6051443, 315.52274, 279.6602, 660.3343, 357.7126, 281.59396, -1.5720415, -1.0751594, -4.9666195]
2025-08-07 09:00:01,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 207.0, 154.0, 368.0, 209.0, 148.0, 35.0, 14.0, 21.0]
2025-08-07 09:00:01,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (188.49) for latency ExtremeClogL1U23
2025-08-07 09:00:01,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 55 seconds)
2025-08-07 09:01:39,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:40,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 93.51424 ± 163.851
2025-08-07 09:01:40,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.059901055, 5.3639245, -6.303706, 103.53164, 2.8316875, 392.46484, 4.7025704, -3.4082773, -1.2055542, 437.10535]
2025-08-07 09:01:40,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 15.0, 10.0, 166.0, 17.0, 240.0, 16.0, 15.0, 20.0, 201.0]
2025-08-07 09:01:40,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 12 seconds)
2025-08-07 09:03:16,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:17,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 46.94136 ± 87.142
2025-08-07 09:03:17,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [123.965225, -5.402606, 3.7858343, 282.47498, 51.866276, -1.1991692, 7.181434, 4.7424793, 1.8609214, 0.1382038]
2025-08-07 09:03:17,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 20.0, 20.0, 150.0, 108.0, 21.0, 29.0, 17.0, 16.0, 11.0]
2025-08-07 09:03:17,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 38 seconds)
2025-08-07 09:04:55,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:56,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 102.46018 ± 159.828
2025-08-07 09:04:56,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [522.951, 3.0927603, 1.8523968, 86.68224, 232.73254, 1.2693413, 1.652073, 13.238774, 2.7839925, 158.34673]
2025-08-07 09:04:56,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 15.0, 19.0, 110.0, 138.0, 14.0, 16.0, 26.0, 20.0, 87.0]
2025-08-07 09:04:56,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 14 seconds)
2025-08-07 09:06:31,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:32,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 85.10609 ± 123.413
2025-08-07 09:06:32,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [8.19015, 4.464617, 286.2485, 3.3413455, 12.955311, 1.7503794, 253.24403, 279.96503, 3.3286645, -2.427056]
2025-08-07 09:06:32,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 17.0, 143.0, 33.0, 25.0, 22.0, 156.0, 137.0, 18.0, 14.0]
2025-08-07 09:06:32,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 17 seconds)
2025-08-07 09:08:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:10,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 49.49866 ± 103.161
2025-08-07 09:08:10,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.0746865, 5.704753, 11.162999, 338.9729, 2.8965306, 2.3995721, 125.61565, -0.29232705, -2.297316, 3.7491138]
2025-08-07 09:08:10,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 19.0, 24.0, 174.0, 19.0, 22.0, 139.0, 14.0, 24.0, 21.0]
2025-08-07 09:08:10,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 40 seconds)
2025-08-07 09:09:47,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:48,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 82.77565 ± 165.530
2025-08-07 09:09:48,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [443.9434, -0.25611347, 2.1675384, 5.2702546, 8.003461, -2.6578267, 381.14084, 0.9525576, -4.248213, -6.559309]
2025-08-07 09:09:48,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [235.0, 15.0, 14.0, 18.0, 20.0, 9.0, 162.0, 19.0, 22.0, 18.0]
2025-08-07 09:09:48,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 52 seconds)
2025-08-07 09:11:25,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:25,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 28.89739 ± 80.888
2025-08-07 09:11:25,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.3306679, -2.6328847, 2.2870321, 3.3140616, -1.4168744, 8.297257, 271.3107, -3.0361128, 7.853618, 1.6664513]
2025-08-07 09:11:25,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 14.0, 15.0, 16.0, 22.0, 20.0, 145.0, 17.0, 31.0, 16.0]
2025-08-07 09:11:25,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 21 seconds)
2025-08-07 09:13:03,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:04,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 166.58054 ± 213.050
2025-08-07 09:13:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [248.9632, 8.4994, 394.66766, -1.924099, 1.534783, 2.9399, 499.64655, -0.85971487, 513.72394, -1.3862317]
2025-08-07 09:13:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [157.0, 17.0, 199.0, 16.0, 22.0, 18.0, 282.0, 26.0, 248.0, 13.0]
2025-08-07 09:13:04,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 41 seconds)
2025-08-07 09:14:43,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:43,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 30.54096 ± 74.938
2025-08-07 09:14:43,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-3.6861987, 5.684076, 2.5204859, 10.635865, -1.0168145, 254.84091, 8.295955, 13.322642, 4.4390154, 10.373666]
2025-08-07 09:14:43,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 13.0, 19.0, 10.0, 180.0, 19.0, 25.0, 21.0, 21.0]
2025-08-07 09:14:43,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 24 seconds)
2025-08-07 09:16:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:22,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 38.58657 ± 71.340
2025-08-07 09:16:22,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.363175, 8.196524, 0.31787995, 187.01428, 3.3864384, 175.17151, 3.668186, 5.0648465, 0.7507838, -0.067921214]
2025-08-07 09:16:22,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 18.0, 13.0, 94.0, 23.0, 89.0, 15.0, 15.0, 13.0, 25.0]
2025-08-07 09:16:22,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 51 seconds)
2025-08-07 09:18:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:01,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 84.53642 ± 163.006
2025-08-07 09:18:01,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [25.302868, 3.940843, 3.3220315, 550.3005, -1.3409748, 1.5795124, 133.20193, 0.321664, 2.5483816, 126.187454]
2025-08-07 09:18:01,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 16.0, 14.0, 348.0, 24.0, 24.0, 194.0, 14.0, 14.0, 91.0]
2025-08-07 09:18:01,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 18 seconds)
2025-08-07 09:19:36,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:38,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 240.77914 ± 242.425
2025-08-07 09:19:38,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.2265942, 512.3445, 479.40723, 1.5744104, -5.9259763, 509.6523, 0.5788768, 495.2004, 412.43832, 0.29490376]
2025-08-07 09:19:38,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 255.0, 228.0, 11.0, 24.0, 246.0, 17.0, 223.0, 203.0, 25.0]
2025-08-07 09:19:38,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (240.78) for latency ExtremeClogL1U23
2025-08-07 09:19:38,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 36 seconds)
2025-08-07 09:21:15,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 216.67252 ± 278.164
2025-08-07 09:21:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [447.8503, 2.0470278, 763.0035, 373.83737, 1.5650473, -1.1200496, 0.74614495, 566.246, 6.2770534, 6.2726064]
2025-08-07 09:21:17,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 20.0, 518.0, 164.0, 17.0, 14.0, 16.0, 239.0, 20.0, 19.0]
2025-08-07 09:21:17,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 56 seconds)
2025-08-07 09:22:55,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:56,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 18.75015 ± 26.130
2025-08-07 09:22:56,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.3941674, 9.888745, 2.6803222, 16.44683, 7.1116524, 91.69525, 2.3802147, 4.5828643, 35.426384, 14.895079]
2025-08-07 09:22:56,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 23.0, 22.0, 81.0, 20.0, 131.0, 20.0, 20.0, 70.0, 62.0]
2025-08-07 09:22:56,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 44 minutes, 17 seconds)
2025-08-07 09:24:33,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:34,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 159.17288 ± 248.842
2025-08-07 09:24:34,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.77075505, 2.3908756, 518.50684, -3.4113567, 688.3939, 1.9828243, 5.5367575, 13.38759, 359.97867, 5.7335095]
2025-08-07 09:24:34,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 16.0, 244.0, 15.0, 359.0, 18.0, 23.0, 24.0, 168.0, 17.0]
2025-08-07 09:24:34,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 39 seconds)
2025-08-07 09:26:11,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:26:13,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 173.94147 ± 220.899
2025-08-07 09:26:13,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.449758, 453.98224, 5.5184665, 4.914584, -0.8901879, 1.2901337, 509.29147, 217.72775, 10.477742, 526.6527]
2025-08-07 09:26:13,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 220.0, 19.0, 20.0, 13.0, 14.0, 243.0, 113.0, 24.0, 268.0]
2025-08-07 09:26:13,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 59 seconds)
2025-08-07 09:27:51,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:52,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 137.73392 ± 216.893
2025-08-07 09:27:52,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.082815, -3.7808368, 7.395614, 3.738916, 268.34674, 3.030789, 521.6761, 10.007093, 561.9332, 3.908693]
2025-08-07 09:27:52,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 20.0, 17.0, 16.0, 136.0, 13.0, 234.0, 24.0, 270.0, 14.0]
2025-08-07 09:27:52,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 31 seconds)
2025-08-07 09:29:29,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:30,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 93.22173 ± 181.002
2025-08-07 09:29:30,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.019712, 0.46681136, -3.6902869, 6.5961514, 583.96814, 4.9355636, 55.329945, 1.6527163, 8.00001, 264.9385]
2025-08-07 09:29:30,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 15.0, 17.0, 22.0, 271.0, 18.0, 94.0, 13.0, 21.0, 118.0]
2025-08-07 09:29:30,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 48 seconds)
2025-08-07 09:31:09,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 45.86530 ± 84.559
2025-08-07 09:31:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.6284446, 2.7644658, 6.2004714, 241.4034, 5.3269925, 0.74061376, 8.774996, 5.548471, 4.988907, 184.53316]
2025-08-07 09:31:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 31.0, 16.0, 133.0, 22.0, 17.0, 20.0, 18.0, 18.0, 115.0]
2025-08-07 09:31:09,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 11 seconds)
2025-08-07 09:32:45,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:32:46,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 63.39103 ± 147.782
2025-08-07 09:32:46,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [496.66302, 4.5713005, 6.002135, -0.593391, 10.153549, 2.8297064, 1.2059945, 108.26785, 1.2528553, 3.5572486]
2025-08-07 09:32:46,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 20.0, 18.0, 24.0, 22.0, 30.0, 14.0, 160.0, 14.0, 14.0]
2025-08-07 09:32:46,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 24 seconds)
2025-08-07 09:34:24,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:34:25,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 46.67610 ± 134.869
2025-08-07 09:34:25,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-4.367131, 10.96018, 0.5847985, 451.0375, -5.040478, 6.8274903, 3.1279967, -1.310072, -0.029606266, 4.9703174]
2025-08-07 09:34:25,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 23.0, 11.0, 181.0, 22.0, 17.0, 14.0, 22.0, 20.0, 18.0]
2025-08-07 09:34:25,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 47 seconds)
2025-08-07 09:36:01,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:36:01,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 80.02061 ± 122.754
2025-08-07 09:36:01,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [338.6927, -3.0807076, 6.6268, 4.9268985, -0.63466454, -0.17308368, 5.3365774, 224.22841, 222.74931, 1.5337888]
2025-08-07 09:36:01,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [274.0, 14.0, 21.0, 27.0, 13.0, 25.0, 17.0, 127.0, 120.0, 13.0]
2025-08-07 09:36:01,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes)
2025-08-07 09:37:40,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:42,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 276.14838 ± 292.063
2025-08-07 09:37:42,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [619.3952, 413.43207, 468.98022, 793.8534, 2.109359, -2.6093414, 0.5969729, 459.10986, 8.188791, -1.572632]
2025-08-07 09:37:42,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 171.0, 221.0, 364.0, 24.0, 12.0, 27.0, 229.0, 22.0, 20.0]
2025-08-07 09:37:42,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (276.15) for latency ExtremeClogL1U23
2025-08-07 09:37:42,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 30 seconds)
2025-08-07 09:39:18,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:39:19,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 79.75490 ± 114.800
2025-08-07 09:39:19,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [74.40568, 1.5742013, 3.485126, 191.4773, 157.66714, 358.12753, 0.06497154, 5.00487, 6.584477, -0.8422202]
2025-08-07 09:39:19,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 13.0, 15.0, 156.0, 96.0, 193.0, 10.0, 15.0, 26.0, 11.0]
2025-08-07 09:39:19,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 44 seconds)
2025-08-07 09:40:57,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:40:59,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 147.37849 ± 236.789
2025-08-07 09:40:59,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.1553493, 351.1378, 0.1184913, 55.594887, 0.39491695, 55.23826, 233.71626, 770.58923, 0.5009312, 2.3388693]
2025-08-07 09:40:59,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 215.0, 18.0, 99.0, 17.0, 88.0, 213.0, 389.0, 14.0, 23.0]
2025-08-07 09:40:59,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 16 seconds)
2025-08-07 09:42:36,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:42:36,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 91.75601 ± 180.941
2025-08-07 09:42:36,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-3.68232, 526.4865, 6.551012, 366.34824, 3.8584104, 2.114135, 1.0529392, -0.2219821, 7.46768, 7.585461]
2025-08-07 09:42:36,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 232.0, 23.0, 148.0, 16.0, 21.0, 15.0, 26.0, 16.0, 23.0]
2025-08-07 09:42:36,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 35 seconds)
2025-08-07 09:44:14,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:44:15,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 71.17527 ± 138.095
2025-08-07 09:44:15,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [380.2318, 2.4308589, 3.9401224, -4.570811, 1.4733753, 14.954863, -1.314567, 2.0967462, 1.8552467, 310.655]
2025-08-07 09:44:15,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 14.0, 25.0, 20.0, 14.0, 25.0, 18.0, 14.0, 14.0, 212.0]
2025-08-07 09:44:15,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 2 seconds)
2025-08-07 09:45:53,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:54,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 161.77577 ± 244.270
2025-08-07 09:45:54,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.5989573, 2.0281427, 3.1566427, 6.908276, 0.57944727, 604.97815, 500.0194, 0.20933552, 4.0562468, 492.223]
2025-08-07 09:45:54,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 20.0, 21.0, 15.0, 306.0, 212.0, 14.0, 25.0, 223.0]
2025-08-07 09:45:54,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 19 seconds)
2025-08-07 09:47:31,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:31,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 16.77464 ± 52.979
2025-08-07 09:47:31,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.3453891, -1.1296842, -10.893504, -0.7224, -0.005969679, 175.30913, -2.8800876, 1.8441036, 2.015453, 0.86396456]
2025-08-07 09:47:31,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 15.0, 21.0, 13.0, 18.0, 102.0, 20.0, 12.0, 32.0, 12.0]
2025-08-07 09:47:31,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 42 seconds)
2025-08-07 09:49:09,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:09,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 53.78868 ± 151.538
2025-08-07 09:49:09,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [8.415102, -1.9352819, 10.659056, 4.8793883, 508.11374, -7.7281666, 5.7903504, -1.7387711, 8.63433, 2.7970588]
2025-08-07 09:49:09,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 12.0, 22.0, 24.0, 239.0, 70.0, 16.0, 16.0, 23.0, 16.0]
2025-08-07 09:49:09,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 59 seconds)
2025-08-07 09:50:48,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:50:49,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 55.46021 ± 93.571
2025-08-07 09:50:49,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [153.86775, 84.41096, -0.98462933, 3.631314, 295.66257, 0.5595281, 5.589573, 2.6473694, 5.62018, 3.5974698]
2025-08-07 09:50:49,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [111.0, 71.0, 15.0, 25.0, 144.0, 13.0, 17.0, 20.0, 25.0, 19.0]
2025-08-07 09:50:49,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 24 seconds)
2025-08-07 09:52:26,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:52:27,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 141.09814 ± 289.272
2025-08-07 09:52:27,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.6516069, 518.31146, -1.2153438, 2.2762804, -0.4257723, -0.35721782, 5.295019, 4.6149244, 4.9642663, 875.86615]
2025-08-07 09:52:27,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 227.0, 13.0, 15.0, 12.0, 9.0, 19.0, 16.0, 25.0, 532.0]
2025-08-07 09:52:27,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 45 seconds)
2025-08-07 09:54:04,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:54:06,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 197.52167 ± 313.043
2025-08-07 09:54:06,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [5.7704797, 2.4362683, 6.207859, 908.558, -1.2888002, 1.3721443, 466.55957, 3.1382723, 571.61334, 10.849513]
2025-08-07 09:54:06,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 20.0, 374.0, 22.0, 16.0, 185.0, 17.0, 294.0, 19.0]
2025-08-07 09:54:06,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 7 seconds)
2025-08-07 09:55:43,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:44,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 74.44133 ± 139.360
2025-08-07 09:55:44,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.4565015, 6.2326317, 47.842213, 188.33218, 1.3132498, 458.22665, -0.014544741, 42.994183, 1.7150037, 0.22818148]
2025-08-07 09:55:44,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 25.0, 82.0, 109.0, 21.0, 207.0, 16.0, 75.0, 20.0, 21.0]
2025-08-07 09:55:44,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 29 seconds)
2025-08-07 09:57:21,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:21,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 93.77487 ± 150.663
2025-08-07 09:57:21,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-5.5186033, 4.561319, 10.595723, -1.4089642, 218.44806, -3.6269007, -1.0029827, 4.552419, 271.47403, 439.67462]
2025-08-07 09:57:21,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 19.0, 22.0, 11.0, 129.0, 16.0, 12.0, 15.0, 124.0, 198.0]
2025-08-07 09:57:21,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 50 seconds)
2025-08-07 09:58:59,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:00,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 201.99004 ± 248.581
2025-08-07 09:59:00,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [518.6174, -2.598024, 1.0077051, -1.438756, 2.261901, 428.87595, 2.00175, 522.9046, 547.3347, 0.93306744]
2025-08-07 09:59:00,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 21.0, 16.0, 10.0, 14.0, 176.0, 15.0, 263.0, 238.0, 17.0]
2025-08-07 09:59:00,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 11 seconds)
2025-08-07 10:00:38,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:00:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 314.26987 ± 417.494
2025-08-07 10:00:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [507.3242, -0.30349877, 661.68304, 9.272755, 299.90222, 1360.0408, 309.33466, -1.2480412, 2.3551137, -5.6624994]
2025-08-07 10:00:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [230.0, 32.0, 365.0, 22.0, 138.0, 570.0, 140.0, 24.0, 19.0, 16.0]
2025-08-07 10:00:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (314.27) for latency ExtremeClogL1U23
2025-08-07 10:00:40,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 33 seconds)
2025-08-07 10:02:17,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:02:18,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 154.31812 ± 231.425
2025-08-07 10:02:18,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.9898179, 2.9559627, -3.820079, 0.6541082, 539.254, 468.93433, 512.6719, 10.940433, 3.6880965, 3.9125733]
2025-08-07 10:02:18,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 17.0, 19.0, 19.0, 243.0, 207.0, 222.0, 21.0, 26.0, 13.0]
2025-08-07 10:02:19,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 55 seconds)
2025-08-07 10:04:02,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:04:03,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 148.32909 ± 393.653
2025-08-07 10:04:03,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1322.8047, 140.32375, 3.107177, -2.0605786, -0.9327659, 4.9307413, 9.142289, 3.03979, 1.6576359, 1.2782352]
2025-08-07 10:04:03,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [527.0, 100.0, 69.0, 20.0, 10.0, 24.0, 24.0, 19.0, 12.0, 11.0]
2025-08-07 10:04:03,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 19 seconds)
2025-08-07 10:05:38,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 74.04776 ± 161.330
2025-08-07 10:05:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.014768588, 149.78384, 6.9399905, 3.7294583, 30.296452, 6.197181, 5.373037, -3.7329738, 1.8713118, 540.0045]
2025-08-07 10:05:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 94.0, 22.0, 25.0, 75.0, 22.0, 20.0, 20.0, 14.0, 260.0]
2025-08-07 10:05:39,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 39 seconds)
2025-08-07 10:07:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:15,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 216.06050 ± 294.074
2025-08-07 10:07:15,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.1921343, 0.46142337, 4.380937, 683.93475, 11.431454, 703.31445, 581.6297, 1.4565201, 175.36212, -0.17427644]
2025-08-07 10:07:15,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 12.0, 26.0, 309.0, 21.0, 392.0, 264.0, 33.0, 94.0, 14.0]
2025-08-07 10:07:15,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1251 [DEBUG]: Training session finished
