2025-08-07 06:42:35,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc15-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:42:35,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc15-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:42:35,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x151402a3f9d0>}
2025-08-07 06:42:35,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 06:42:35,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1133 [INFO]: Creating new trainer
2025-08-07 06:42:35,826 baseline-bpql-noiseperc15-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=83, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 06:42:35,826 baseline-bpql-noiseperc15-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:42:37,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 06:42:37,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 06:44:05,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:05,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 22.54634 ± 9.429
2025-08-07 06:44:05,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [38.044666, 33.613075, 15.461041, 18.028204, 37.34833, 19.384056, 13.873869, 12.861154, 21.871239, 14.977781]
2025-08-07 06:44:05,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 41.0, 17.0, 23.0, 38.0, 24.0, 21.0, 22.0, 33.0, 18.0]
2025-08-07 06:44:05,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (22.55) for latency ExtremeClogL1U23
2025-08-07 06:44:05,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 24 minutes, 49 seconds)
2025-08-07 06:45:40,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:40,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 85.09032 ± 87.555
2025-08-07 06:45:40,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [104.481186, 221.84984, 14.877994, 214.16306, 11.8273535, 37.78213, 201.5074, 14.88368, 17.191137, 12.339418]
2025-08-07 06:45:40,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 134.0, 22.0, 123.0, 16.0, 37.0, 136.0, 17.0, 18.0, 20.0]
2025-08-07 06:45:40,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (85.09) for latency ExtremeClogL1U23
2025-08-07 06:45:40,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 29 minutes, 34 seconds)
2025-08-07 06:47:15,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:16,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 101.63028 ± 87.012
2025-08-07 06:47:16,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.255062, 11.877798, 21.129923, 199.76578, 13.686514, 213.15054, 17.277044, 146.951, 175.30298, 199.90617]
2025-08-07 06:47:16,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 16.0, 24.0, 104.0, 15.0, 129.0, 21.0, 93.0, 97.0, 115.0]
2025-08-07 06:47:16,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (101.63) for latency ExtremeClogL1U23
2025-08-07 06:47:16,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 30 minutes, 11 seconds)
2025-08-07 06:48:50,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:51,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 68.57441 ± 84.447
2025-08-07 06:48:51,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [58.907375, 15.330795, 21.287426, 48.3349, 312.28683, 26.718626, 15.924946, 60.043167, 34.352325, 92.55769]
2025-08-07 06:48:51,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 18.0, 22.0, 69.0, 215.0, 35.0, 24.0, 57.0, 68.0, 83.0]
2025-08-07 06:48:51,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 29 minutes, 26 seconds)
2025-08-07 06:50:25,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:26,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 31.73032 ± 28.253
2025-08-07 06:50:26,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [31.94952, 14.437719, 113.317635, 37.13032, 23.966627, 29.785536, 19.38515, 12.378851, 17.265297, 17.68656]
2025-08-07 06:50:26,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 15.0, 85.0, 44.0, 23.0, 25.0, 25.0, 21.0, 21.0, 20.0]
2025-08-07 06:50:26,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 28 minutes, 19 seconds)
2025-08-07 06:52:00,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:01,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 79.78371 ± 64.539
2025-08-07 06:52:01,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [90.706154, 181.16232, 40.09398, 80.1319, 62.265835, 209.90997, 9.483668, 86.59776, 18.72184, 18.76373]
2025-08-07 06:52:01,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 88.0, 43.0, 54.0, 54.0, 111.0, 17.0, 75.0, 24.0, 25.0]
2025-08-07 06:52:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 29 minutes, 6 seconds)
2025-08-07 06:53:36,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:37,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 53.21994 ± 45.371
2025-08-07 06:53:37,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [110.20002, 55.305687, 22.330477, 9.214402, 12.65668, 81.915764, 75.2514, 12.459086, 10.287656, 142.57826]
2025-08-07 06:53:37,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 51.0, 23.0, 12.0, 18.0, 64.0, 60.0, 20.0, 22.0, 80.0]
2025-08-07 06:53:37,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 27 minutes, 39 seconds)
2025-08-07 06:55:11,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:12,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 64.02404 ± 54.579
2025-08-07 06:55:12,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [39.41783, 11.947359, 90.02143, 70.15778, 196.5294, 100.33658, 18.833616, 15.802424, 80.51665, 16.677374]
2025-08-07 06:55:12,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 20.0, 56.0, 67.0, 124.0, 70.0, 20.0, 22.0, 74.0, 21.0]
2025-08-07 06:55:12,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 25 minutes, 59 seconds)
2025-08-07 06:56:47,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:47,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 45.69077 ± 40.442
2025-08-07 06:56:47,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [62.7391, 150.40857, 21.436352, 21.968132, 35.28755, 9.93841, 77.621666, 41.259563, 16.942112, 19.306261]
2025-08-07 06:56:47,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 106.0, 23.0, 21.0, 36.0, 12.0, 70.0, 52.0, 20.0, 22.0]
2025-08-07 06:56:47,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 24 minutes, 28 seconds)
2025-08-07 06:58:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:24,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 110.40529 ± 78.980
2025-08-07 06:58:24,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [169.47612, 72.902214, 46.592155, 121.762955, 84.40261, 97.73866, 21.61706, 29.44905, 161.5908, 298.52124]
2025-08-07 06:58:24,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 58.0, 49.0, 77.0, 58.0, 56.0, 25.0, 42.0, 97.0, 145.0]
2025-08-07 06:58:24,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (110.41) for latency ExtremeClogL1U23
2025-08-07 06:58:24,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 23 minutes, 24 seconds)
2025-08-07 06:59:58,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:58,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 67.44801 ± 66.955
2025-08-07 06:59:58,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [63.84898, 228.16035, 135.1237, 17.37321, 21.80827, 15.901945, 58.365993, 11.412381, 105.18062, 17.304613]
2025-08-07 06:59:58,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 116.0, 107.0, 24.0, 24.0, 23.0, 53.0, 14.0, 70.0, 19.0]
2025-08-07 06:59:58,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 21 minutes, 39 seconds)
2025-08-07 07:01:34,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:35,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 107.31675 ± 102.629
2025-08-07 07:01:35,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [24.708473, 395.01788, 106.24648, 14.205541, 114.94716, 100.3308, 68.03409, 70.95851, 47.870037, 130.84857]
2025-08-07 07:01:35,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 203.0, 78.0, 17.0, 73.0, 73.0, 58.0, 78.0, 52.0, 95.0]
2025-08-07 07:01:35,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 20 minutes, 10 seconds)
2025-08-07 07:03:10,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:10,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 51.62188 ± 33.128
2025-08-07 07:03:10,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.805852, 45.18217, 26.006496, 129.88007, 11.817768, 49.375946, 54.997314, 57.18161, 42.460663, 84.510864]
2025-08-07 07:03:10,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 42.0, 24.0, 90.0, 14.0, 47.0, 54.0, 45.0, 51.0, 51.0]
2025-08-07 07:03:10,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 18 minutes, 45 seconds)
2025-08-07 07:04:45,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:45,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 60.91952 ± 47.283
2025-08-07 07:04:45,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.86009, 22.005945, 35.23547, 46.29922, 11.787244, 91.26031, 24.106524, 88.88947, 118.87667, 155.87425]
2025-08-07 07:04:45,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 37.0, 43.0, 46.0, 15.0, 71.0, 23.0, 60.0, 90.0, 88.0]
2025-08-07 07:04:45,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 17 minutes, 5 seconds)
2025-08-07 07:06:20,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:21,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 59.01983 ± 41.468
2025-08-07 07:06:21,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [10.285691, 8.817955, 83.681984, 137.34987, 118.955215, 46.410763, 67.13504, 58.016346, 26.058271, 33.4872]
2025-08-07 07:06:21,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 12.0, 54.0, 94.0, 83.0, 45.0, 53.0, 49.0, 25.0, 34.0]
2025-08-07 07:06:21,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 15 minutes, 14 seconds)
2025-08-07 07:07:56,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 78.15208 ± 60.151
2025-08-07 07:07:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.246813, 26.51124, 15.290588, 53.027546, 18.642744, 122.91957, 75.40865, 111.86725, 195.71692, 145.88943]
2025-08-07 07:07:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 23.0, 23.0, 56.0, 22.0, 88.0, 68.0, 92.0, 132.0, 83.0]
2025-08-07 07:07:57,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 13 minutes, 56 seconds)
2025-08-07 07:09:31,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:32,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 32.93794 ± 30.750
2025-08-07 07:09:32,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [20.235533, 17.79278, 114.04967, 63.887684, 10.380535, 12.7923565, 16.49468, 18.078663, 33.74764, 21.919859]
2025-08-07 07:09:32,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 19.0, 81.0, 57.0, 14.0, 21.0, 22.0, 24.0, 41.0, 22.0]
2025-08-07 07:09:32,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 11 minutes, 57 seconds)
2025-08-07 07:11:06,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:07,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 64.54688 ± 59.650
2025-08-07 07:11:07,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.018309, 71.918274, 70.76485, 12.644538, 77.68393, 17.77287, 170.42937, 175.08186, 11.265737, 16.889008]
2025-08-07 07:11:07,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 60.0, 63.0, 16.0, 69.0, 21.0, 98.0, 109.0, 14.0, 37.0]
2025-08-07 07:11:07,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 10 minutes, 17 seconds)
2025-08-07 07:12:42,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:43,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 49.63329 ± 43.438
2025-08-07 07:12:43,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.265993, 19.349005, 69.06481, 93.63221, 147.38242, 14.090801, 79.445, 19.748772, 20.596388, 15.757576]
2025-08-07 07:12:43,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 20.0, 57.0, 84.0, 95.0, 16.0, 94.0, 28.0, 19.0, 22.0]
2025-08-07 07:12:43,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 8 minutes, 52 seconds)
2025-08-07 07:14:18,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:18,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 27.96303 ± 28.643
2025-08-07 07:14:18,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [26.295656, 112.3842, 14.188692, 13.822657, 21.787348, 11.479837, 22.623262, 10.63324, 22.60242, 23.813013]
2025-08-07 07:14:18,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 84.0, 19.0, 16.0, 23.0, 20.0, 26.0, 18.0, 23.0, 40.0]
2025-08-07 07:14:18,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 7 minutes, 16 seconds)
2025-08-07 07:15:54,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:55,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 94.94337 ± 66.808
2025-08-07 07:15:55,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.42595, 198.26108, 14.086123, 95.97774, 66.64653, 13.318909, 85.75136, 104.58124, 157.19745, 195.1873]
2025-08-07 07:15:55,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 118.0, 15.0, 70.0, 50.0, 15.0, 71.0, 72.0, 114.0, 98.0]
2025-08-07 07:15:55,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 5 minutes, 48 seconds)
2025-08-07 07:17:28,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:29,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 46.29453 ± 28.908
2025-08-07 07:17:29,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.305778, 78.340416, 16.606998, 17.72221, 53.743217, 54.4125, 15.693271, 81.51885, 40.05553, 92.546555]
2025-08-07 07:17:29,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 65.0, 20.0, 19.0, 45.0, 45.0, 24.0, 63.0, 40.0, 66.0]
2025-08-07 07:17:29,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 4 minutes, 4 seconds)
2025-08-07 07:19:04,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:05,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 68.88155 ± 61.088
2025-08-07 07:19:05,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [52.239647, 19.185532, 73.185875, 92.91851, 67.5295, 15.023549, 16.012793, 21.361666, 105.348915, 226.00958]
2025-08-07 07:19:05,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 20.0, 65.0, 68.0, 59.0, 18.0, 17.0, 23.0, 72.0, 128.0]
2025-08-07 07:19:05,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 2 minutes, 35 seconds)
2025-08-07 07:20:39,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 72.07974 ± 44.915
2025-08-07 07:20:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [34.115967, 17.603249, 92.10953, 152.08745, 74.429855, 58.690792, 9.36303, 109.03714, 47.61934, 125.74109]
2025-08-07 07:20:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 23.0, 53.0, 101.0, 51.0, 63.0, 12.0, 85.0, 44.0, 72.0]
2025-08-07 07:20:40,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 53 seconds)
2025-08-07 07:22:14,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:15,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 58.55150 ± 32.016
2025-08-07 07:22:15,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [107.71437, 54.2282, 15.675524, 66.583626, 62.468887, 86.49673, 14.471947, 75.572014, 13.914509, 88.38922]
2025-08-07 07:22:15,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 47.0, 19.0, 54.0, 48.0, 56.0, 17.0, 54.0, 22.0, 74.0]
2025-08-07 07:22:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 59 minutes, 3 seconds)
2025-08-07 07:23:48,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:49,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 37.44949 ± 27.708
2025-08-07 07:23:49,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [73.9269, 14.605441, 15.325125, 14.175848, 17.953876, 26.938461, 91.86183, 16.882643, 35.137062, 67.68772]
2025-08-07 07:23:49,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 20.0, 18.0, 23.0, 23.0, 43.0, 73.0, 17.0, 36.0, 55.0]
2025-08-07 07:23:49,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 56 minutes, 55 seconds)
2025-08-07 07:25:22,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:23,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 77.89138 ± 49.590
2025-08-07 07:25:23,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [150.82106, 13.4706335, 125.95669, 80.4438, 78.596924, 15.920095, 27.472006, 154.38571, 77.85508, 53.991802]
2025-08-07 07:25:23,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 16.0, 89.0, 71.0, 65.0, 16.0, 24.0, 107.0, 66.0, 45.0]
2025-08-07 07:25:23,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 55 minutes, 18 seconds)
2025-08-07 07:26:56,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:57,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 90.68591 ± 49.415
2025-08-07 07:26:57,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.636964, 60.25063, 161.32845, 56.17816, 146.7268, 85.68472, 49.339825, 58.346992, 111.863434, 162.50314]
2025-08-07 07:26:57,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 51.0, 95.0, 59.0, 93.0, 67.0, 49.0, 52.0, 81.0, 95.0]
2025-08-07 07:26:57,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 53 minutes, 22 seconds)
2025-08-07 07:28:30,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:31,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 91.71684 ± 80.765
2025-08-07 07:28:31,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [115.02771, 290.3623, 25.092272, 160.5446, 54.94103, 106.52282, 10.204752, 51.668854, 12.798979, 90.005104]
2025-08-07 07:28:31,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 136.0, 48.0, 95.0, 48.0, 71.0, 16.0, 61.0, 15.0, 86.0]
2025-08-07 07:28:31,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 51 minutes, 31 seconds)
2025-08-07 07:30:05,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 69.33414 ± 56.228
2025-08-07 07:30:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [89.02734, 57.74233, 209.15706, 16.520016, 111.19791, 70.19438, 38.648632, 13.762025, 72.18726, 14.904469]
2025-08-07 07:30:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 60.0, 121.0, 18.0, 98.0, 49.0, 60.0, 17.0, 60.0, 24.0]
2025-08-07 07:30:06,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 49 minutes, 55 seconds)
2025-08-07 07:31:39,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:40,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 82.99352 ± 39.762
2025-08-07 07:31:40,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [131.10208, 74.56244, 128.60406, 51.3875, 75.53872, 115.47074, 117.395905, 23.603676, 16.879995, 95.390076]
2025-08-07 07:31:40,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 57.0, 68.0, 57.0, 57.0, 77.0, 83.0, 23.0, 22.0, 75.0]
2025-08-07 07:31:40,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 48 minutes, 26 seconds)
2025-08-07 07:33:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:14,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 94.68918 ± 74.247
2025-08-07 07:33:14,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [25.354963, 17.536701, 22.15224, 22.551523, 183.36722, 73.95943, 150.19638, 152.60799, 230.65599, 68.50935]
2025-08-07 07:33:14,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 17.0, 23.0, 25.0, 103.0, 47.0, 93.0, 121.0, 134.0, 57.0]
2025-08-07 07:33:14,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 46 minutes, 50 seconds)
2025-08-07 07:34:47,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:48,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 87.75385 ± 101.000
2025-08-07 07:34:48,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.683767, 73.60035, 14.631121, 108.651794, 120.756294, 13.07206, 19.712475, 355.7988, 138.96275, 13.669147]
2025-08-07 07:34:48,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 57.0, 21.0, 89.0, 88.0, 19.0, 22.0, 170.0, 87.0, 17.0]
2025-08-07 07:34:48,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 45 minutes, 11 seconds)
2025-08-07 07:36:21,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:22,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 62.50001 ± 42.545
2025-08-07 07:36:22,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [50.459206, 62.89253, 118.572, 85.00255, 14.032469, 30.875685, 111.15202, 17.758642, 124.33054, 9.924473]
2025-08-07 07:36:22,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 83.0, 105.0, 76.0, 15.0, 34.0, 87.0, 22.0, 80.0, 16.0]
2025-08-07 07:36:22,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 43 minutes, 37 seconds)
2025-08-07 07:37:56,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:57,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 34.83478 ± 36.306
2025-08-07 07:37:57,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.009197, 17.872389, 12.959894, 17.86078, 17.64836, 19.053696, 21.505663, 131.29343, 19.116793, 74.027596]
2025-08-07 07:37:57,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 22.0, 18.0, 24.0, 20.0, 24.0, 27.0, 93.0, 20.0, 63.0]
2025-08-07 07:37:57,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 42 minutes, 2 seconds)
2025-08-07 07:39:31,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:32,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 87.95547 ± 71.666
2025-08-07 07:39:32,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.886603, 11.064661, 188.91734, 152.47044, 147.19324, 21.697397, 21.643078, 153.8951, 17.550854, 151.23595]
2025-08-07 07:39:32,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 13.0, 104.0, 80.0, 102.0, 24.0, 22.0, 104.0, 18.0, 87.0]
2025-08-07 07:39:32,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 40 minutes, 39 seconds)
2025-08-07 07:41:06,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:07,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 136.53020 ± 117.938
2025-08-07 07:41:07,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [68.794266, 15.698583, 242.45973, 12.917627, 159.87544, 57.36001, 11.940211, 326.6307, 146.46376, 323.16156]
2025-08-07 07:41:07,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 18.0, 127.0, 17.0, 110.0, 50.0, 14.0, 155.0, 96.0, 156.0]
2025-08-07 07:41:07,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (136.53) for latency ExtremeClogL1U23
2025-08-07 07:41:07,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 39 minutes, 18 seconds)
2025-08-07 07:42:41,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 75.14056 ± 48.768
2025-08-07 07:42:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.079047, 97.89074, 63.427547, 11.2562895, 90.15375, 158.86823, 142.0113, 58.032433, 24.387177, 94.29906]
2025-08-07 07:42:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 103.0, 61.0, 19.0, 57.0, 101.0, 83.0, 47.0, 24.0, 71.0]
2025-08-07 07:42:41,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 37 minutes, 47 seconds)
2025-08-07 07:44:16,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:17,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 72.64242 ± 60.107
2025-08-07 07:44:17,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [48.568375, 14.415691, 178.61887, 73.84135, 58.559654, 168.00453, 23.238554, 11.134046, 23.065132, 126.97803]
2025-08-07 07:44:17,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 17.0, 104.0, 82.0, 57.0, 110.0, 21.0, 23.0, 23.0, 76.0]
2025-08-07 07:44:17,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 36 minutes, 31 seconds)
2025-08-07 07:45:51,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:52,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 78.30716 ± 86.988
2025-08-07 07:45:52,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [70.51978, 10.972424, 105.64574, 302.3005, 142.05545, 13.188476, 17.247635, 15.451137, 90.75028, 14.940197]
2025-08-07 07:45:52,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 17.0, 75.0, 134.0, 107.0, 20.0, 23.0, 19.0, 82.0, 20.0]
2025-08-07 07:45:52,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 34 minutes, 58 seconds)
2025-08-07 07:47:26,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:27,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 56.71644 ± 53.513
2025-08-07 07:47:27,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.868483, 130.41986, 88.54978, 11.3996105, 148.10692, 18.1864, 19.37846, 112.636986, 12.406487, 13.211353]
2025-08-07 07:47:27,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 85.0, 84.0, 17.0, 96.0, 23.0, 24.0, 88.0, 17.0, 19.0]
2025-08-07 07:47:27,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 33 minutes, 23 seconds)
2025-08-07 07:49:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:01,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 96.61053 ± 82.212
2025-08-07 07:49:01,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [195.13287, 217.82678, 234.6713, 96.069664, 13.080773, 25.614996, 74.54407, 45.6118, 14.243964, 49.30912]
2025-08-07 07:49:01,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 117.0, 131.0, 58.0, 22.0, 24.0, 84.0, 67.0, 17.0, 47.0]
2025-08-07 07:49:01,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 31 minutes, 42 seconds)
2025-08-07 07:50:36,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:37,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 69.24796 ± 55.609
2025-08-07 07:50:37,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [19.048983, 17.791243, 76.51501, 13.527909, 133.65414, 109.68914, 12.377992, 134.53625, 21.943567, 153.39534]
2025-08-07 07:50:37,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 50.0, 17.0, 71.0, 69.0, 15.0, 84.0, 24.0, 95.0]
2025-08-07 07:50:37,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 30 minutes, 20 seconds)
2025-08-07 07:52:09,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:10,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 109.09885 ± 75.591
2025-08-07 07:52:10,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.067145, 68.577545, 210.08589, 115.12179, 65.92663, 66.71944, 246.40323, 118.63646, 14.831449, 173.61899]
2025-08-07 07:52:10,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 58.0, 123.0, 102.0, 49.0, 52.0, 143.0, 80.0, 18.0, 103.0]
2025-08-07 07:52:10,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 28 minutes, 24 seconds)
2025-08-07 07:53:44,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:44,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 102.86875 ± 69.823
2025-08-07 07:53:44,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [132.52264, 14.353916, 118.202576, 227.66579, 102.45499, 12.032932, 196.9825, 88.98447, 20.518639, 114.96907]
2025-08-07 07:53:44,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 21.0, 86.0, 126.0, 70.0, 17.0, 111.0, 68.0, 24.0, 84.0]
2025-08-07 07:53:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 26 minutes, 41 seconds)
2025-08-07 07:55:19,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:20,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 98.11070 ± 50.102
2025-08-07 07:55:20,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [68.217995, 13.909046, 55.293274, 192.34879, 54.481754, 92.40389, 132.52335, 104.38898, 152.49944, 115.040535]
2025-08-07 07:55:20,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 20.0, 52.0, 104.0, 47.0, 90.0, 74.0, 71.0, 111.0, 79.0]
2025-08-07 07:55:20,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 25 minutes, 8 seconds)
2025-08-07 07:56:54,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:54,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 73.96232 ± 48.414
2025-08-07 07:56:54,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [20.637758, 30.40521, 151.6037, 130.15175, 14.282372, 15.013234, 71.35725, 100.212494, 96.027466, 109.931946]
2025-08-07 07:56:54,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 26.0, 107.0, 74.0, 19.0, 16.0, 49.0, 69.0, 64.0, 75.0]
2025-08-07 07:56:54,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 23 minutes, 35 seconds)
2025-08-07 07:58:29,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:58:29,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 73.83359 ± 51.108
2025-08-07 07:58:29,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [19.820805, 68.4947, 107.592865, 118.46632, 101.848, 12.666053, 152.00644, 125.417564, 10.661675, 21.36145]
2025-08-07 07:58:29,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 59.0, 67.0, 87.0, 67.0, 16.0, 89.0, 92.0, 13.0, 25.0]
2025-08-07 07:58:29,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 21 minutes, 54 seconds)
2025-08-07 08:00:04,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:00:05,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 130.93163 ± 86.918
2025-08-07 08:00:05,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [153.82634, 193.8228, 98.19084, 227.59978, 272.71375, 17.527857, 141.20653, 14.31801, 19.524801, 170.58554]
2025-08-07 08:00:05,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 107.0, 80.0, 117.0, 157.0, 22.0, 90.0, 18.0, 22.0, 88.0]
2025-08-07 08:00:05,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 20 minutes, 37 seconds)
2025-08-07 08:01:40,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:40,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 82.01380 ± 58.740
2025-08-07 08:01:40,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [84.21755, 93.28318, 130.49545, 156.06784, 175.39061, 30.241247, 11.768517, 111.25583, 16.6177, 10.800082]
2025-08-07 08:01:40,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 59.0, 78.0, 95.0, 102.0, 33.0, 13.0, 88.0, 19.0, 16.0]
2025-08-07 08:01:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 19 minutes, 18 seconds)
2025-08-07 08:03:14,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:15,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 88.35637 ± 92.359
2025-08-07 08:03:15,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [108.223595, 14.469955, 10.015358, 12.662889, 315.69427, 18.743725, 129.17073, 19.773201, 155.98837, 98.82162]
2025-08-07 08:03:15,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 21.0, 13.0, 19.0, 149.0, 18.0, 90.0, 20.0, 94.0, 77.0]
2025-08-07 08:03:15,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 17 minutes, 40 seconds)
2025-08-07 08:04:50,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:50,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 57.01336 ± 64.151
2025-08-07 08:04:50,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [159.046, 10.92403, 9.152033, 12.382636, 158.61418, 146.24284, 14.166024, 24.499632, 12.836114, 22.27015]
2025-08-07 08:04:50,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 18.0, 15.0, 15.0, 93.0, 88.0, 18.0, 24.0, 19.0, 23.0]
2025-08-07 08:04:50,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 16 minutes, 10 seconds)
2025-08-07 08:06:25,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:26,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 115.36271 ± 90.953
2025-08-07 08:06:26,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [273.7224, 103.82879, 214.84427, 14.720662, 17.842194, 162.13466, 154.46889, 12.248518, 18.819952, 180.9967]
2025-08-07 08:06:26,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 73.0, 131.0, 19.0, 21.0, 108.0, 100.0, 21.0, 23.0, 94.0]
2025-08-07 08:06:26,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 14 minutes, 43 seconds)
2025-08-07 08:08:00,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 74.94691 ± 50.635
2025-08-07 08:08:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [164.2757, 16.35639, 34.16139, 105.29482, 87.7838, 11.958937, 16.087355, 74.1581, 115.25364, 124.139046]
2025-08-07 08:08:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 25.0, 65.0, 100.0, 103.0, 17.0, 18.0, 58.0, 83.0, 82.0]
2025-08-07 08:08:01,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 13 minutes, 4 seconds)
2025-08-07 08:09:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:37,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 131.93483 ± 108.460
2025-08-07 08:09:37,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.664515, 115.44041, 143.39812, 25.418295, 19.531532, 127.59351, 346.3241, 296.2151, 171.20795, 56.554775]
2025-08-07 08:09:37,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 87.0, 86.0, 23.0, 24.0, 109.0, 182.0, 148.0, 103.0, 36.0]
2025-08-07 08:09:37,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 11 minutes, 30 seconds)
2025-08-07 08:11:11,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 110.50635 ± 94.875
2025-08-07 08:11:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [146.08318, 137.86456, 81.05583, 21.929733, 131.56616, 142.65854, 57.467197, 350.87045, 14.213605, 21.35413]
2025-08-07 08:11:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 101.0, 96.0, 22.0, 91.0, 100.0, 84.0, 167.0, 19.0, 25.0]
2025-08-07 08:11:12,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 9 minutes, 57 seconds)
2025-08-07 08:12:47,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:48,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 102.91093 ± 105.460
2025-08-07 08:12:48,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.702489, 80.13808, 10.933289, 114.41197, 20.134556, 273.5756, 97.302345, 85.83965, 324.27643, 10.795024]
2025-08-07 08:12:48,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 53.0, 17.0, 98.0, 20.0, 145.0, 59.0, 59.0, 136.0, 14.0]
2025-08-07 08:12:48,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 8 minutes, 24 seconds)
2025-08-07 08:14:23,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:24,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 99.13942 ± 62.631
2025-08-07 08:14:24,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [88.48381, 185.73048, 104.89632, 166.98564, 122.609566, 94.83366, 177.17729, 14.939646, 14.701958, 21.035833]
2025-08-07 08:14:24,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 116.0, 82.0, 97.0, 87.0, 93.0, 109.0, 19.0, 16.0, 23.0]
2025-08-07 08:14:24,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 6 minutes, 47 seconds)
2025-08-07 08:15:58,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:58,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 92.43654 ± 96.971
2025-08-07 08:15:58,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [15.867538, 48.952164, 206.20383, 14.9616995, 300.2818, 12.422675, 147.90619, 16.830435, 14.691797, 146.24733]
2025-08-07 08:15:58,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 85.0, 111.0, 22.0, 134.0, 14.0, 80.0, 22.0, 15.0, 88.0]
2025-08-07 08:15:58,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 5 minutes, 12 seconds)
2025-08-07 08:17:33,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:34,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 120.86792 ± 83.180
2025-08-07 08:17:34,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.991425, 134.58963, 203.55696, 110.26112, 11.107425, 15.110505, 216.0284, 245.32294, 86.71192, 168.99892]
2025-08-07 08:17:34,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 84.0, 124.0, 81.0, 18.0, 16.0, 127.0, 143.0, 68.0, 96.0]
2025-08-07 08:17:34,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 35 seconds)
2025-08-07 08:19:08,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:09,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 155.88437 ± 94.803
2025-08-07 08:19:09,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [128.47601, 94.15844, 19.485037, 160.54568, 221.32956, 346.1123, 199.45021, 212.19963, 10.827291, 166.25957]
2025-08-07 08:19:09,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 76.0, 19.0, 98.0, 123.0, 185.0, 136.0, 128.0, 15.0, 99.0]
2025-08-07 08:19:09,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (155.88) for latency ExtremeClogL1U23
2025-08-07 08:19:09,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 2 minutes, 1 second)
2025-08-07 08:20:44,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:44,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 100.67696 ± 58.309
2025-08-07 08:20:44,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [201.85149, 173.74147, 159.47772, 98.817276, 99.28771, 15.45495, 80.40098, 79.753174, 21.535242, 76.44957]
2025-08-07 08:20:44,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [142.0, 102.0, 95.0, 66.0, 67.0, 23.0, 50.0, 66.0, 23.0, 48.0]
2025-08-07 08:20:45,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 23 seconds)
2025-08-07 08:22:20,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:21,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 121.07108 ± 107.306
2025-08-07 08:22:21,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.365822, 102.51664, 112.743286, 19.278076, 215.11006, 11.23026, 347.90384, 9.701396, 173.09044, 200.77104]
2025-08-07 08:22:21,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 75.0, 77.0, 25.0, 105.0, 15.0, 149.0, 14.0, 115.0, 112.0]
2025-08-07 08:22:21,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 58 minutes, 50 seconds)
2025-08-07 08:23:55,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:56,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 116.49586 ± 106.540
2025-08-07 08:23:56,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [26.056337, 358.49295, 150.60527, 16.920946, 77.00398, 20.507807, 222.62453, 99.90426, 17.14373, 175.69882]
2025-08-07 08:23:56,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 219.0, 91.0, 22.0, 53.0, 23.0, 117.0, 76.0, 22.0, 100.0]
2025-08-07 08:23:56,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 17 seconds)
2025-08-07 08:25:30,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:32,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 176.01656 ± 101.488
2025-08-07 08:25:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [170.47604, 175.01292, 341.50555, 358.12204, 112.75608, 75.08348, 213.38246, 135.01294, 18.17187, 160.6422]
2025-08-07 08:25:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 97.0, 185.0, 150.0, 70.0, 69.0, 108.0, 93.0, 22.0, 122.0]
2025-08-07 08:25:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (176.02) for latency ExtremeClogL1U23
2025-08-07 08:25:32,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 43 seconds)
2025-08-07 08:27:06,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 101.80196 ± 48.334
2025-08-07 08:27:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [140.67412, 15.888513, 114.46181, 20.805504, 150.58116, 140.18915, 150.60016, 95.71022, 68.35865, 120.75035]
2025-08-07 08:27:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 19.0, 80.0, 24.0, 99.0, 83.0, 106.0, 60.0, 45.0, 86.0]
2025-08-07 08:27:07,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 6 seconds)
2025-08-07 08:28:42,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:43,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 88.70818 ± 102.295
2025-08-07 08:28:43,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [123.818184, 130.16054, 13.728784, 17.311779, 15.821445, 86.22807, 102.33539, 363.05627, 15.64295, 18.978462]
2025-08-07 08:28:43,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 156.0, 18.0, 21.0, 22.0, 64.0, 63.0, 189.0, 18.0, 23.0]
2025-08-07 08:28:43,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 38 seconds)
2025-08-07 08:30:17,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:17,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 103.42584 ± 120.867
2025-08-07 08:30:17,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [22.043734, 103.70307, 14.188793, 43.74283, 11.600182, 95.97807, 161.41237, 140.54613, 429.434, 11.609287]
2025-08-07 08:30:17,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 80.0, 18.0, 38.0, 20.0, 61.0, 93.0, 88.0, 185.0, 16.0]
2025-08-07 08:30:17,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 51 seconds)
2025-08-07 08:31:53,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:54,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 146.26701 ± 131.369
2025-08-07 08:31:54,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.016142, 182.64064, 184.60405, 109.43579, 66.66636, 23.124039, 460.6997, 140.71219, 14.949527, 263.82172]
2025-08-07 08:31:54,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 117.0, 101.0, 84.0, 57.0, 25.0, 180.0, 95.0, 24.0, 126.0]
2025-08-07 08:31:54,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 24 seconds)
2025-08-07 08:33:29,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:30,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 150.48294 ± 111.021
2025-08-07 08:33:30,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [103.45274, 108.355644, 361.7429, 209.04834, 213.79564, 109.7716, 299.19443, 70.60696, 18.180155, 10.681031]
2025-08-07 08:33:30,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 77.0, 163.0, 118.0, 105.0, 64.0, 165.0, 81.0, 24.0, 15.0]
2025-08-07 08:33:30,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 50 seconds)
2025-08-07 08:35:06,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:07,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 74.83915 ± 50.836
2025-08-07 08:35:07,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.170477, 163.02269, 91.8515, 20.972462, 66.608315, 13.488332, 102.09915, 25.84282, 126.11412, 120.22163]
2025-08-07 08:35:07,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 88.0, 72.0, 21.0, 75.0, 16.0, 66.0, 25.0, 84.0, 106.0]
2025-08-07 08:35:07,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 25 seconds)
2025-08-07 08:36:42,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:43,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 78.28596 ± 69.280
2025-08-07 08:36:43,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.098743, 93.13661, 26.489647, 250.46901, 15.638978, 49.46584, 85.60004, 108.167946, 122.22051, 18.572289]
2025-08-07 08:36:43,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 53.0, 22.0, 130.0, 16.0, 82.0, 77.0, 67.0, 81.0, 22.0]
2025-08-07 08:36:43,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 46 seconds)
2025-08-07 08:38:19,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:20,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 78.72475 ± 57.365
2025-08-07 08:38:20,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [95.51279, 17.05406, 89.01726, 15.155137, 183.98775, 137.29907, 99.74124, 119.186066, 14.506803, 15.787439]
2025-08-07 08:38:20,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 24.0, 67.0, 22.0, 87.0, 101.0, 59.0, 99.0, 17.0, 23.0]
2025-08-07 08:38:20,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 23 seconds)
2025-08-07 08:39:55,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:56,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 74.49838 ± 62.649
2025-08-07 08:39:56,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [119.01807, 18.905907, 112.71924, 115.140114, 14.826884, 171.16084, 13.717391, 13.579139, 155.79637, 10.119897]
2025-08-07 08:39:56,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 21.0, 76.0, 70.0, 20.0, 103.0, 23.0, 15.0, 93.0, 14.0]
2025-08-07 08:39:56,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 47 seconds)
2025-08-07 08:41:32,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:33,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 63.39269 ± 83.208
2025-08-07 08:41:33,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.739582, 293.91205, 65.05761, 15.887764, 72.26623, 16.631365, 11.8590975, 16.839888, 110.171455, 14.561839]
2025-08-07 08:41:33,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 132.0, 43.0, 18.0, 54.0, 21.0, 15.0, 18.0, 80.0, 17.0]
2025-08-07 08:41:33,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 14 seconds)
2025-08-07 08:43:08,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:10,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 164.99557 ± 109.897
2025-08-07 08:43:10,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.033321, 384.264, 166.78766, 303.33, 85.29129, 218.71413, 147.95631, 16.977264, 142.24034, 166.36148]
2025-08-07 08:43:10,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 213.0, 112.0, 112.0, 58.0, 144.0, 81.0, 24.0, 94.0, 105.0]
2025-08-07 08:43:10,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 36 seconds)
2025-08-07 08:44:45,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:45,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 63.72811 ± 102.523
2025-08-07 08:44:45,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.446807, 25.55834, 15.232986, 23.793934, 13.609201, 25.700602, 13.910394, 350.44888, 14.016103, 141.5638]
2025-08-07 08:44:45,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 23.0, 20.0, 25.0, 20.0, 24.0, 21.0, 143.0, 18.0, 105.0]
2025-08-07 08:44:45,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 58 seconds)
2025-08-07 08:46:21,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 123.47990 ± 89.092
2025-08-07 08:46:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [118.84572, 55.386818, 196.46701, 90.50141, 122.699715, 119.960434, 328.1792, 15.807911, 12.876864, 174.07394]
2025-08-07 08:46:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 50.0, 104.0, 64.0, 91.0, 94.0, 159.0, 20.0, 18.0, 119.0]
2025-08-07 08:46:22,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 21 seconds)
2025-08-07 08:47:57,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:58,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 139.06686 ± 97.349
2025-08-07 08:47:58,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.002438, 106.61368, 93.611534, 356.45035, 120.644615, 242.44167, 187.91098, 142.41013, 107.14175, 19.441706]
2025-08-07 08:47:58,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 72.0, 75.0, 175.0, 93.0, 135.0, 107.0, 168.0, 81.0, 20.0]
2025-08-07 08:47:58,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 45 seconds)
2025-08-07 08:49:34,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:34,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 69.89259 ± 95.564
2025-08-07 08:49:34,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.520243, 16.375202, 7.4353523, 124.83993, 11.842996, 15.557652, 15.255235, 306.6809, 170.66376, 17.754688]
2025-08-07 08:49:34,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 12.0, 92.0, 21.0, 17.0, 19.0, 151.0, 100.0, 23.0]
2025-08-07 08:49:34,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 6 seconds)
2025-08-07 08:51:10,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:11,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 96.17055 ± 73.854
2025-08-07 08:51:11,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [179.17754, 20.064734, 187.00233, 22.742403, 16.666058, 10.404279, 124.231186, 171.59727, 175.59021, 54.22951]
2025-08-07 08:51:11,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 20.0, 119.0, 26.0, 22.0, 14.0, 75.0, 95.0, 95.0, 102.0]
2025-08-07 08:51:11,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 29 seconds)
2025-08-07 08:52:46,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:52:47,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 133.92477 ± 133.029
2025-08-07 08:52:47,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.691963, 21.277174, 344.07025, 191.7039, 18.412708, 123.119934, 95.11234, 13.303859, 111.42067, 404.13495]
2025-08-07 08:52:47,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 149.0, 115.0, 23.0, 93.0, 83.0, 18.0, 78.0, 188.0]
2025-08-07 08:52:47,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 53 seconds)
2025-08-07 08:54:23,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:54:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 92.95048 ± 68.574
2025-08-07 08:54:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.365556, 16.188742, 21.73985, 225.31857, 120.495155, 145.86374, 121.330986, 16.689508, 118.52311, 129.98949]
2025-08-07 08:54:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 17.0, 21.0, 123.0, 92.0, 117.0, 76.0, 18.0, 82.0, 105.0]
2025-08-07 08:54:23,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 17 seconds)
2025-08-07 08:55:59,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:59,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 89.77357 ± 80.969
2025-08-07 08:55:59,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.89077, 12.502821, 129.43457, 153.0482, 14.065713, 224.04457, 20.341225, 20.299364, 102.42626, 209.68225]
2025-08-07 08:55:59,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 20.0, 81.0, 84.0, 24.0, 119.0, 25.0, 24.0, 63.0, 101.0]
2025-08-07 08:55:59,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 39 seconds)
2025-08-07 08:57:35,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:36,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 119.83098 ± 116.886
2025-08-07 08:57:36,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.764791, 87.166435, 104.71255, 10.77586, 205.6624, 13.660612, 171.45006, 396.03156, 180.00867, 15.076918]
2025-08-07 08:57:36,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 72.0, 76.0, 16.0, 95.0, 18.0, 112.0, 161.0, 112.0, 21.0]
2025-08-07 08:57:36,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 5 seconds)
2025-08-07 08:59:12,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:13,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 163.44427 ± 110.963
2025-08-07 08:59:13,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [269.8771, 143.88503, 437.8409, 181.27383, 88.466995, 109.060234, 152.38585, 15.430665, 141.41544, 94.806595]
2025-08-07 08:59:13,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 80.0, 170.0, 108.0, 76.0, 70.0, 123.0, 18.0, 85.0, 66.0]
2025-08-07 08:59:13,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 29 seconds)
2025-08-07 09:00:48,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:50,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 218.12175 ± 162.053
2025-08-07 09:00:50,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [229.27954, 183.98935, 438.96127, 16.392958, 96.54666, 587.70496, 102.33976, 143.24213, 186.72371, 196.03728]
2025-08-07 09:00:50,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 147.0, 184.0, 17.0, 68.0, 209.0, 72.0, 113.0, 110.0, 97.0]
2025-08-07 09:00:50,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (218.12) for latency ExtremeClogL1U23
2025-08-07 09:00:50,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 55 seconds)
2025-08-07 09:02:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:26,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 232.24614 ± 176.507
2025-08-07 09:02:26,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [488.56558, 240.63887, 10.808857, 177.59155, 233.74664, 250.43518, 11.357438, 203.336, 592.4045, 113.57689]
2025-08-07 09:02:26,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 122.0, 17.0, 101.0, 125.0, 144.0, 26.0, 103.0, 236.0, 124.0]
2025-08-07 09:02:26,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (232.25) for latency ExtremeClogL1U23
2025-08-07 09:02:26,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 19 seconds)
2025-08-07 09:04:03,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 135.84410 ± 91.826
2025-08-07 09:04:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [42.45046, 245.73236, 225.604, 140.62532, 83.045715, 12.596611, 102.66521, 19.924894, 241.0787, 244.71774]
2025-08-07 09:04:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 128.0, 124.0, 85.0, 56.0, 17.0, 64.0, 20.0, 127.0, 133.0]
2025-08-07 09:04:04,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 45 seconds)
2025-08-07 09:05:40,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:41,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 57.34754 ± 75.749
2025-08-07 09:05:41,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.300514, 12.680146, 19.631687, 16.149393, 79.928505, 15.806355, 123.44435, 13.979921, 258.51758, 17.036907]
2025-08-07 09:05:41,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 24.0, 25.0, 66.0, 23.0, 75.0, 19.0, 131.0, 21.0]
2025-08-07 09:05:41,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 8 seconds)
2025-08-07 09:07:16,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:17,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 112.78976 ± 87.744
2025-08-07 09:07:17,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [175.68527, 146.9982, 243.11768, 16.270054, 23.743433, 241.77464, 124.42759, 132.62411, 10.9937725, 12.262945]
2025-08-07 09:07:17,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 89.0, 145.0, 22.0, 24.0, 131.0, 68.0, 90.0, 15.0, 18.0]
2025-08-07 09:07:17,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 31 seconds)
2025-08-07 09:08:52,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:53,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 100.56773 ± 84.000
2025-08-07 09:08:53,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.268589, 15.320684, 211.18033, 127.57503, 123.981544, 219.07195, 69.260284, 205.0311, 10.760194, 12.22765]
2025-08-07 09:08:53,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 23.0, 132.0, 76.0, 75.0, 194.0, 54.0, 156.0, 15.0, 15.0]
2025-08-07 09:08:53,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 53 seconds)
2025-08-07 09:10:29,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:30,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 116.08575 ± 101.117
2025-08-07 09:10:30,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [9.301691, 219.37897, 22.564281, 11.520644, 245.98547, 22.588171, 20.98083, 249.24767, 182.96277, 176.32703]
2025-08-07 09:10:30,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 118.0, 25.0, 15.0, 120.0, 25.0, 22.0, 126.0, 110.0, 103.0]
2025-08-07 09:10:30,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 16 seconds)
2025-08-07 09:12:05,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:06,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 81.24004 ± 83.386
2025-08-07 09:12:06,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [19.601784, 71.91202, 14.610509, 10.147965, 172.14119, 273.62637, 15.465757, 17.639023, 86.21048, 131.04529]
2025-08-07 09:12:06,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 54.0, 23.0, 16.0, 111.0, 136.0, 22.0, 20.0, 79.0, 92.0]
2025-08-07 09:12:06,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 38 seconds)
2025-08-07 09:13:42,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:43,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 118.54226 ± 78.890
2025-08-07 09:13:43,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [163.48575, 97.57262, 17.828608, 102.05709, 187.47324, 15.168804, 238.4748, 12.523109, 205.66945, 145.16922]
2025-08-07 09:13:43,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 70.0, 22.0, 88.0, 105.0, 20.0, 144.0, 17.0, 128.0, 88.0]
2025-08-07 09:13:43,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 1 second)
2025-08-07 09:15:18,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:19,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 112.18976 ± 86.311
2025-08-07 09:15:19,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [15.532534, 164.79741, 123.03897, 137.07684, 17.888126, 169.87607, 13.667671, 12.8458, 247.32454, 219.8497]
2025-08-07 09:15:19,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 93.0, 72.0, 83.0, 20.0, 97.0, 24.0, 18.0, 119.0, 121.0]
2025-08-07 09:15:19,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 25 seconds)
2025-08-07 09:16:55,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:56,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 68.27352 ± 59.045
2025-08-07 09:16:56,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.099814, 71.25126, 162.13976, 138.47533, 12.890612, 125.67924, 14.868615, 13.483022, 12.3877, 119.45992]
2025-08-07 09:16:56,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 61.0, 95.0, 90.0, 15.0, 69.0, 21.0, 19.0, 14.0, 92.0]
2025-08-07 09:16:56,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 49 seconds)
2025-08-07 09:18:32,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:32,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 99.88478 ± 117.383
2025-08-07 09:18:32,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.197081, 243.86188, 11.763951, 358.17444, 14.417637, 17.255678, 16.625418, 129.65111, 15.0144825, 177.88625]
2025-08-07 09:18:32,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 110.0, 15.0, 156.0, 19.0, 24.0, 24.0, 77.0, 23.0, 103.0]
2025-08-07 09:18:32,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 13 seconds)
2025-08-07 09:20:07,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:08,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 93.78808 ± 82.028
2025-08-07 09:20:08,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.17673, 199.9777, 11.830782, 18.9063, 15.38152, 103.35144, 176.4824, 187.02744, 11.011465, 192.73505]
2025-08-07 09:20:08,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 117.0, 16.0, 21.0, 17.0, 57.0, 97.0, 129.0, 18.0, 118.0]
2025-08-07 09:20:08,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 36 seconds)
2025-08-07 09:21:44,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 107.39158 ± 83.736
2025-08-07 09:21:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [106.95382, 254.35371, 17.195944, 11.62791, 213.79019, 136.24818, 159.772, 21.062778, 17.512316, 135.39891]
2025-08-07 09:21:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 114.0, 22.0, 16.0, 116.0, 77.0, 98.0, 24.0, 21.0, 74.0]
2025-08-07 09:21:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1251 [DEBUG]: Training session finished
