2025-08-07 04:01:32,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 04:01:32,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 04:01:32,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14ff00d4fc10>}
2025-08-07 04:01:32,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 04:01:32,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 04:01:32,901 baseline-bpql-noiseperc20-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:01:32,901 baseline-bpql-noiseperc20-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:01:35,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 04:01:35,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 04:03:08,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:08,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 5.90628 ± 4.249
2025-08-07 04:03:08,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.64014477, 10.905671, 7.5200105, 5.804757, 6.3333063, 15.094104, 4.0405736, 0.6746296, 5.1727443, 2.8768554]
2025-08-07 04:03:08,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 37.0, 23.0, 45.0, 18.0, 43.0, 22.0, 11.0, 17.0, 15.0]
2025-08-07 04:03:08,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (5.91) for latency ExtremeSparseL4U32
2025-08-07 04:03:08,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 34 minutes, 15 seconds)
2025-08-07 04:04:48,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:49,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 15.82233 ± 18.934
2025-08-07 04:04:49,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [33.204918, -0.21737121, 18.149416, 8.666063, 3.0738685, 9.582285, 65.725914, 2.591203, 7.9771113, 9.469884]
2025-08-07 04:04:49,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [94.0, 13.0, 83.0, 34.0, 16.0, 39.0, 79.0, 15.0, 17.0, 25.0]
2025-08-07 04:04:49,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (15.82) for latency ExtremeSparseL4U32
2025-08-07 04:04:49,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 38 minutes, 25 seconds)
2025-08-07 04:06:30,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:30,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 14.72795 ± 38.095
2025-08-07 04:06:30,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-27.418974, 6.83318, 6.5201845, 1.7590197, 5.256083, 124.58101, 12.14828, 6.236894, 10.061681, 1.3021458]
2025-08-07 04:06:30,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 21.0, 32.0, 21.0, 26.0, 112.0, 23.0, 17.0, 22.0, 24.0]
2025-08-07 04:06:30,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 39 minutes, 24 seconds)
2025-08-07 04:08:12,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:12,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 10.87789 ± 11.824
2025-08-07 04:08:12,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.921241, -0.5275267, 3.2689672, 4.2248063, 1.3102975, 36.654045, 3.1206825, 26.766521, 19.334097, 7.70581]
2025-08-07 04:08:12,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 20.0, 20.0, 20.0, 60.0, 15.0, 55.0, 46.0, 26.0]
2025-08-07 04:08:12,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 39 minutes, 6 seconds)
2025-08-07 04:09:53,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:09:54,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 5.42294 ± 3.817
2025-08-07 04:09:54,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.3771118, 3.816279, 3.18722, 13.702459, 7.3271275, 6.6878877, 3.5533462, -0.12756088, 5.59111, 9.114468]
2025-08-07 04:09:54,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 19.0, 23.0, 28.0, 38.0, 19.0, 18.0, 35.0, 19.0, 31.0]
2025-08-07 04:09:54,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 38 minutes, 6 seconds)
2025-08-07 04:11:35,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:35,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 6.91548 ± 4.340
2025-08-07 04:11:35,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.9543304, 2.1928885, 2.261239, 2.4762006, 3.23025, 15.158037, 6.752107, 10.34666, 12.16315, 8.619978]
2025-08-07 04:11:35,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 13.0, 15.0, 18.0, 13.0, 27.0, 20.0, 25.0, 22.0, 18.0]
2025-08-07 04:11:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 38 minutes, 51 seconds)
2025-08-07 04:13:16,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:17,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 8.05447 ± 6.627
2025-08-07 04:13:17,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.6911342, 4.7983913, 8.961166, 16.746449, 10.53425, 22.851372, 0.14981088, 4.69792, 4.7381735, 4.3760037]
2025-08-07 04:13:17,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 25.0, 31.0, 30.0, 32.0, 12.0, 16.0, 16.0, 16.0]
2025-08-07 04:13:17,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 37 minutes, 30 seconds)
2025-08-07 04:14:58,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:14:58,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 10.12117 ± 10.334
2025-08-07 04:14:58,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.6304747, 23.168299, 4.0766377, 9.716013, 3.9634616, 33.73676, 1.2722243, 3.4813645, 16.032892, 4.1335297]
2025-08-07 04:14:58,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 74.0, 15.0, 32.0, 16.0, 71.0, 23.0, 29.0, 29.0, 17.0]
2025-08-07 04:14:59,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 35 minutes, 49 seconds)
2025-08-07 04:16:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 22.18801 ± 25.542
2025-08-07 04:16:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.5126886, 11.787472, 77.46328, 8.689092, 1.1290183, 1.8383911, 10.723162, 65.12582, 11.484269, 26.12689]
2025-08-07 04:16:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 30.0, 169.0, 19.0, 19.0, 12.0, 21.0, 134.0, 26.0, 87.0]
2025-08-07 04:16:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (22.19) for latency ExtremeSparseL4U32
2025-08-07 04:16:40,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 34 minutes, 5 seconds)
2025-08-07 04:18:21,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:21,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 5.35863 ± 14.889
2025-08-07 04:18:21,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-32.397602, 2.1894276, 21.756063, 0.03344381, 19.463644, 10.258214, 1.1139141, 4.3973956, 5.8563843, 20.915459]
2025-08-07 04:18:21,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [113.0, 16.0, 57.0, 16.0, 99.0, 33.0, 17.0, 30.0, 26.0, 105.0]
2025-08-07 04:18:21,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 32 minutes, 14 seconds)
2025-08-07 04:20:04,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:04,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.86802 ± 6.187
2025-08-07 04:20:04,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.102992, 21.23804, 4.390501, 14.797445, 11.993556, 1.4903127, -0.23644635, 6.914482, 3.7484398, 6.240873]
2025-08-07 04:20:04,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 95.0, 22.0, 45.0, 27.0, 23.0, 17.0, 32.0, 19.0, 16.0]
2025-08-07 04:20:04,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 31 minutes)
2025-08-07 04:21:45,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:21:45,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 24.66769 ± 63.674
2025-08-07 04:21:45,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.034834, 0.22324811, 7.1433296, -6.237509, 9.998489, 2.5917077, 215.20128, 6.122871, 5.66063, -1.0619618]
2025-08-07 04:21:45,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 17.0, 22.0, 82.0, 23.0, 21.0, 151.0, 32.0, 22.0, 23.0]
2025-08-07 04:21:45,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (24.67) for latency ExtremeSparseL4U32
2025-08-07 04:21:45,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 29 minutes, 14 seconds)
2025-08-07 04:23:27,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:23:27,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 6.51132 ± 5.430
2025-08-07 04:23:27,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [16.33169, 1.9817754, 3.7050753, 3.4477608, 6.1355352, 4.6267242, 14.90409, 4.3409195, 10.985377, -1.3457881]
2025-08-07 04:23:27,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 19.0, 28.0, 14.0, 18.0, 23.0, 30.0, 19.0, 19.0, 19.0]
2025-08-07 04:23:27,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 27 minutes, 33 seconds)
2025-08-07 04:25:10,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:25:10,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 13.17451 ± 15.016
2025-08-07 04:25:10,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.0263665, 7.3153048, 5.3431997, 3.6648564, 1.7935188, 47.437275, 36.727177, 3.056161, 12.267777, 11.113451]
2025-08-07 04:25:10,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 29.0, 16.0, 27.0, 25.0, 74.0, 65.0, 14.0, 23.0, 23.0]
2025-08-07 04:25:10,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 26 minutes, 7 seconds)
2025-08-07 04:26:51,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:51,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.10007 ± 7.418
2025-08-07 04:26:51,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.558603, 1.3460785, 28.07119, 2.5494833, 5.3254824, 3.5117042, 9.422187, 1.2707412, 6.1610184, 6.7842307]
2025-08-07 04:26:51,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 15.0, 102.0, 16.0, 21.0, 13.0, 35.0, 19.0, 17.0, 27.0]
2025-08-07 04:26:52,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 24 minutes, 32 seconds)
2025-08-07 04:28:32,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:33,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 23.24877 ± 46.407
2025-08-07 04:28:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.1927114, 161.07817, 3.7956095, 7.7464457, 1.7984875, 2.2346249, 25.573048, 11.579231, 5.0287175, 8.460625]
2025-08-07 04:28:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 136.0, 25.0, 28.0, 16.0, 26.0, 223.0, 29.0, 23.0, 19.0]
2025-08-07 04:28:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 22 minutes, 34 seconds)
2025-08-07 04:30:14,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:30:15,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 32.33947 ± 70.529
2025-08-07 04:30:15,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.5313773, 8.530702, -9.382417, 11.652834, 6.8539534, 5.618691, 11.71734, 36.533684, 3.889996, 241.4485]
2025-08-07 04:30:15,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 22.0, 125.0, 36.0, 20.0, 19.0, 31.0, 74.0, 23.0, 169.0]
2025-08-07 04:30:15,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (32.34) for latency ExtremeSparseL4U32
2025-08-07 04:30:15,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 20 minutes, 54 seconds)
2025-08-07 04:31:56,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:31:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 24.13807 ± 30.728
2025-08-07 04:31:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.238412, -0.5615574, 71.560524, 8.446165, 3.3987763, 6.733327, 13.056805, 3.7299697, 90.53916, 40.239132]
2025-08-07 04:31:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 17.0, 133.0, 23.0, 17.0, 22.0, 34.0, 16.0, 128.0, 67.0]
2025-08-07 04:31:57,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 19 minutes, 10 seconds)
2025-08-07 04:33:38,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:33:38,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 18.53861 ± 30.170
2025-08-07 04:33:38,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.7151136, 8.195123, 4.53743, 44.43088, 101.28912, -0.3819565, 7.311736, 10.936067, -0.3417749, 4.6943784]
2025-08-07 04:33:38,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 23.0, 26.0, 92.0, 157.0, 20.0, 23.0, 27.0, 17.0, 22.0]
2025-08-07 04:33:38,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 17 minutes, 12 seconds)
2025-08-07 04:35:19,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:35:20,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 37.18386 ± 53.529
2025-08-07 04:35:20,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.6031, 11.423992, 13.439341, 8.053388, 10.235585, 6.906296, 44.96876, 184.55496, 6.9670887, 75.68606]
2025-08-07 04:35:20,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 26.0, 28.0, 26.0, 29.0, 26.0, 70.0, 133.0, 19.0, 176.0]
2025-08-07 04:35:20,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (37.18) for latency ExtremeSparseL4U32
2025-08-07 04:35:20,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 15 minutes, 31 seconds)
2025-08-07 04:37:01,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:37:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 23.59195 ± 46.035
2025-08-07 04:37:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.02740806, 7.7810555, 24.547293, 5.4408345, 19.186546, 4.528894, 4.471256, 3.549207, 6.376149, 160.01086]
2025-08-07 04:37:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 32.0, 57.0, 23.0, 41.0, 35.0, 22.0, 19.0, 16.0, 102.0]
2025-08-07 04:37:01,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 13 minutes, 48 seconds)
2025-08-07 04:38:44,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:38:44,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 21.00985 ± 40.777
2025-08-07 04:38:44,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.2821646, 3.4304423, 16.03056, 12.059573, 6.215161, 2.202044, 142.68373, 7.0498843, 11.562537, 5.5823936]
2025-08-07 04:38:44,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 26.0, 30.0, 24.0, 23.0, 108.0, 25.0, 22.0, 23.0]
2025-08-07 04:38:44,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 12 minutes, 29 seconds)
2025-08-07 04:40:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:40:24,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 4.86220 ± 4.329
2025-08-07 04:40:24,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.3896613, 11.766005, 0.7459865, -0.6633186, 6.462195, 5.6390653, 2.73768, 8.3195305, 0.753224, 11.471964]
2025-08-07 04:40:24,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 22.0, 11.0, 29.0, 21.0, 18.0, 32.0, 14.0, 32.0]
2025-08-07 04:40:24,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 16 seconds)
2025-08-07 04:42:05,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:06,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 10.00819 ± 13.112
2025-08-07 04:42:06,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [18.763012, 4.602634, 3.461348, 2.98107, 44.857147, 15.449719, -1.4956938, 1.101217, 2.1141589, 8.247327]
2025-08-07 04:42:06,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 17.0, 18.0, 20.0, 64.0, 28.0, 13.0, 26.0, 13.0, 23.0]
2025-08-07 04:42:06,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 33 seconds)
2025-08-07 04:43:47,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:43:47,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 5.76913 ± 5.048
2025-08-07 04:43:47,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.344885, 18.667784, 4.719798, 3.8896677, 3.1245615, 7.5177436, -1.8368233, 4.575067, 8.620938, 5.0677133]
2025-08-07 04:43:47,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 30.0, 24.0, 21.0, 26.0, 16.0, 13.0, 21.0, 30.0, 20.0]
2025-08-07 04:43:48,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 57 seconds)
2025-08-07 04:45:29,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:45:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 10.82078 ± 18.107
2025-08-07 04:45:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.505202, 1.6190873, 7.307306, 64.32443, 2.0822344, 5.454169, 1.9077784, 0.7567816, 5.275514, 9.975341]
2025-08-07 04:45:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 20.0, 76.0, 22.0, 17.0, 13.0, 28.0, 19.0, 25.0]
2025-08-07 04:45:29,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 5 minutes, 13 seconds)
2025-08-07 04:47:10,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:11,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 12.01790 ± 12.388
2025-08-07 04:47:11,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.280163, 14.658748, 13.393666, 4.570856, 4.949582, 10.014511, 47.7299, 5.349749, 6.164909, 5.066893]
2025-08-07 04:47:11,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 27.0, 74.0, 26.0, 15.0, 26.0, 78.0, 23.0, 25.0, 18.0]
2025-08-07 04:47:11,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 17 seconds)
2025-08-07 04:48:52,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:48:53,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 46.09407 ± 84.925
2025-08-07 04:48:53,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.947016, 4.7353225, -2.219988, 220.8207, 1.491435, 210.67491, 7.4355245, -1.0053822, 7.798456, 8.262725]
2025-08-07 04:48:53,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 17.0, 15.0, 155.0, 31.0, 140.0, 22.0, 24.0, 24.0, 30.0]
2025-08-07 04:48:53,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (46.09) for latency ExtremeSparseL4U32
2025-08-07 04:48:53,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 2 minutes, 11 seconds)
2025-08-07 04:50:35,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:50:36,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 48.65708 ± 85.821
2025-08-07 04:50:36,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.7461188, 4.1334305, 12.877917, 182.83752, 7.259747, 4.111877, 8.652773, 7.397994, 251.95416, 5.5992136]
2025-08-07 04:50:36,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 26.0, 25.0, 132.0, 22.0, 22.0, 19.0, 21.0, 182.0, 32.0]
2025-08-07 04:50:36,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (48.66) for latency ExtremeSparseL4U32
2025-08-07 04:50:36,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 45 seconds)
2025-08-07 04:52:18,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:52:18,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 13.62770 ± 19.794
2025-08-07 04:52:18,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.2844753, 1.3297864, 5.180499, 1.969895, 64.73006, 4.6505294, -0.31752408, 2.0686674, 35.149395, 16.23122]
2025-08-07 04:52:18,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 26.0, 16.0, 19.0, 91.0, 17.0, 11.0, 14.0, 122.0, 26.0]
2025-08-07 04:52:18,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 59 minutes, 12 seconds)
2025-08-07 04:54:00,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:54:01,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 20.60403 ± 48.501
2025-08-07 04:54:01,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-3.221987, 6.990783, 8.480277, 11.440486, 2.8165553, 9.607677, -1.3500711, 4.9207473, 165.46774, 0.8880718]
2025-08-07 04:54:01,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 19.0, 24.0, 25.0, 26.0, 27.0, 27.0, 25.0, 99.0, 25.0]
2025-08-07 04:54:01,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 57 minutes, 41 seconds)
2025-08-07 04:55:43,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 72.58910 ± 106.773
2025-08-07 04:55:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.3693233, 1.0750067, 224.04277, 9.701387, 10.251635, 141.0439, 8.700536, 9.555431, 5.0294943, 310.12158]
2025-08-07 04:55:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 17.0, 161.0, 21.0, 22.0, 98.0, 21.0, 27.0, 20.0, 259.0]
2025-08-07 04:55:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (72.59) for latency ExtremeSparseL4U32
2025-08-07 04:55:45,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 24 seconds)
2025-08-07 04:57:26,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:57:26,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 30.46045 ± 52.452
2025-08-07 04:57:26,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.91997, 10.94527, 119.995735, 148.67259, 1.9145404, 2.5430002, -1.2056906, 6.5041637, 3.5871058, 1.7278361]
2025-08-07 04:57:26,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 21.0, 90.0, 112.0, 19.0, 14.0, 23.0, 18.0, 16.0, 16.0]
2025-08-07 04:57:26,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2025-08-07 04:59:08,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:59:09,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 12.06200 ± 15.102
2025-08-07 04:59:09,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.176941, 12.005763, 12.060444, 8.231686, 0.42784032, 13.132782, 3.514121, 2.802961, 4.6155486, 55.65196]
2025-08-07 04:59:09,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 30.0, 36.0, 18.0, 15.0, 103.0, 30.0, 17.0, 15.0, 85.0]
2025-08-07 04:59:09,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 52 minutes, 46 seconds)
2025-08-07 05:00:51,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:52,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 35.71356 ± 89.977
2025-08-07 05:00:52,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.0325136, -0.47691116, 2.7128787, 5.0516768, 11.821815, 7.750101, -2.2979481, 305.15244, 8.764327, 16.624714]
2025-08-07 05:00:52,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 24.0, 23.0, 30.0, 33.0, 21.0, 175.0, 18.0, 30.0]
2025-08-07 05:00:52,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 17 seconds)
2025-08-07 05:02:34,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:02:35,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 11.54273 ± 9.507
2025-08-07 05:02:35,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.1388597, 9.893429, 7.818665, 3.5270312, 10.024565, 9.617437, 7.307435, 37.999588, 16.503256, 9.597069]
2025-08-07 05:02:35,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 32.0, 20.0, 15.0, 21.0, 20.0, 26.0, 154.0, 33.0, 33.0]
2025-08-07 05:02:35,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 49 minutes, 39 seconds)
2025-08-07 05:04:18,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:04:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 20.68688 ± 46.356
2025-08-07 05:04:19,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.8503346, 6.2180104, 3.338623, 159.65543, 2.1755388, 7.274174, 7.436813, 3.14757, 6.7649727, 6.007369]
2025-08-07 05:04:19,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 24.0, 101.0, 15.0, 17.0, 21.0, 23.0, 20.0, 24.0]
2025-08-07 05:04:19,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 47 minutes, 57 seconds)
2025-08-07 05:05:59,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:59,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.15253 ± 2.917
2025-08-07 05:05:59,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.9784157, 9.641777, 5.5249715, 3.8815727, 9.4579, 8.794077, 2.7407894, 10.840455, 9.665378, 7.9999847]
2025-08-07 05:05:59,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 29.0, 21.0, 14.0, 26.0, 25.0, 17.0, 31.0, 27.0, 21.0]
2025-08-07 05:05:59,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 46 minutes)
2025-08-07 05:07:42,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:07:43,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 40.07106 ± 53.741
2025-08-07 05:07:43,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.2921195, 117.69094, 5.965219, 13.433914, 4.5893044, 13.366995, 5.3541484, 159.37169, 2.5559483, 72.0903]
2025-08-07 05:07:43,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 88.0, 20.0, 27.0, 14.0, 31.0, 15.0, 110.0, 20.0, 177.0]
2025-08-07 05:07:43,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 44 minutes, 35 seconds)
2025-08-07 05:09:24,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:09:24,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 4.68233 ± 3.266
2025-08-07 05:09:24,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.5690627, 4.0830255, 3.722829, 1.3880186, 12.514085, 0.31728107, 5.5819187, 5.039435, 7.670067, 2.9375906]
2025-08-07 05:09:24,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 14.0, 29.0, 27.0, 12.0, 20.0, 16.0, 17.0, 16.0]
2025-08-07 05:09:24,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 27 seconds)
2025-08-07 05:11:06,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:11:07,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 24.01740 ± 45.217
2025-08-07 05:11:07,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [159.05864, 13.209405, 10.502089, 5.374782, 17.42684, 9.690371, 6.4378877, 1.67735, 5.370465, 11.426173]
2025-08-07 05:11:07,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 32.0, 27.0, 31.0, 27.0, 23.0, 19.0, 13.0, 21.0, 23.0]
2025-08-07 05:11:07,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 44 seconds)
2025-08-07 05:12:49,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:12:50,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 49.26640 ± 80.048
2025-08-07 05:12:50,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.3745434, 1.7151738, 7.361177, 251.84975, 114.2488, -0.88897353, -0.57292026, 5.8930945, 108.42908, 4.2542834]
2025-08-07 05:12:50,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 28.0, 25.0, 170.0, 81.0, 14.0, 11.0, 26.0, 116.0, 16.0]
2025-08-07 05:12:50,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 48 seconds)
2025-08-07 05:14:33,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:14:34,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 8.53659 ± 8.952
2025-08-07 05:14:34,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.782243, 9.202199, 3.7614932, 12.331096, 2.375357, 4.203662, 3.4691844, 34.02436, 6.0968266, 5.1195035]
2025-08-07 05:14:34,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 23.0, 27.0, 22.0, 21.0, 14.0, 86.0, 20.0, 18.0]
2025-08-07 05:14:34,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes, 48 seconds)
2025-08-07 05:16:14,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:16:15,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 22.13607 ± 35.628
2025-08-07 05:16:15,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.55943716, 2.2996104, 0.860001, -0.47857153, 29.047022, 12.114052, 118.35933, 2.5691297, 5.5256, 50.505108]
2025-08-07 05:16:15,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 17.0, 21.0, 28.0, 108.0, 24.0, 111.0, 29.0, 15.0, 76.0]
2025-08-07 05:16:15,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 36 seconds)
2025-08-07 05:17:58,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:17:58,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 11.63467 ± 22.756
2025-08-07 05:17:58,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.628952, 4.7402673, 1.4406749, 3.8302684, 3.9596658, 79.24658, 8.476334, 3.7887294, 3.5625474, -2.327344]
2025-08-07 05:17:58,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 15.0, 12.0, 18.0, 16.0, 73.0, 20.0, 28.0, 21.0, 15.0]
2025-08-07 05:17:58,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 34 minutes, 9 seconds)
2025-08-07 05:19:39,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:19:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 31.57321 ± 49.175
2025-08-07 05:19:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.4147387, 91.95204, 158.44165, 5.0163155, 10.135054, 8.368312, 3.5066328, 10.912629, 11.029334, 9.955402]
2025-08-07 05:19:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 99.0, 122.0, 23.0, 20.0, 20.0, 21.0, 19.0, 29.0, 21.0]
2025-08-07 05:19:39,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 32 minutes, 12 seconds)
2025-08-07 05:21:20,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:21:21,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 5.46196 ± 3.658
2025-08-07 05:21:21,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [11.672506, 6.8261933, 6.5269966, 2.6307838, 1.8494872, 6.3889318, 9.320767, 3.874252, -1.6128227, 7.142503]
2025-08-07 05:21:21,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 19.0, 18.0, 21.0, 16.0, 20.0, 34.0, 17.0, 13.0, 24.0]
2025-08-07 05:21:21,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 16 seconds)
2025-08-07 05:23:00,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:23:00,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 9.64532 ± 6.577
2025-08-07 05:23:00,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [14.217494, 4.6767063, 23.340786, -1.5877941, 6.7764473, 8.9720955, 4.669149, 8.268961, 11.661047, 15.458277]
2025-08-07 05:23:00,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 30.0, 35.0, 15.0, 26.0, 27.0, 16.0, 25.0, 28.0, 32.0]
2025-08-07 05:23:00,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 27 minutes, 47 seconds)
2025-08-07 05:24:40,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:40,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 10.28291 ± 5.882
2025-08-07 05:24:40,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [15.738779, 1.6038195, 21.213585, 11.3794, 3.817258, 13.60645, 11.359967, 2.9588084, 8.081977, 13.069007]
2025-08-07 05:24:40,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 18.0, 35.0, 36.0, 15.0, 25.0, 26.0, 17.0, 37.0, 28.0]
2025-08-07 05:24:40,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 25 minutes, 53 seconds)
2025-08-07 05:26:20,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:26:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.96170 ± 5.417
2025-08-07 05:26:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-3.2611475, 5.4026904, 12.885427, 4.3602467, 3.7853987, 15.501293, 11.144669, 6.2826223, 9.797586, 13.718226]
2025-08-07 05:26:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 27.0, 15.0, 23.0, 26.0, 43.0, 34.0, 26.0, 32.0]
2025-08-07 05:26:21,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 23 minutes, 45 seconds)
2025-08-07 05:28:00,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:00,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 6.67204 ± 3.459
2025-08-07 05:28:00,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.195959, 2.4943933, 5.71476, 3.4188557, 6.4848967, 9.213123, 5.595085, 15.419556, 4.415242, 7.768577]
2025-08-07 05:28:00,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 30.0, 31.0, 20.0, 17.0, 33.0, 16.0, 34.0, 19.0, 24.0]
2025-08-07 05:28:01,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-08-07 05:29:40,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:29:41,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 9.84826 ± 4.813
2025-08-07 05:29:41,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.630714, 12.212718, 8.219605, 2.1195886, 8.286903, 6.728137, 8.620543, 19.207722, 17.249397, 8.207288]
2025-08-07 05:29:41,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 34.0, 28.0, 25.0, 23.0, 33.0, 25.0, 34.0, 34.0, 31.0]
2025-08-07 05:29:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 19 minutes, 59 seconds)
2025-08-07 05:31:20,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:31:21,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 8.91901 ± 3.618
2025-08-07 05:31:21,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [14.600524, 15.80636, 8.106025, 4.8332734, 7.798852, 9.705649, 8.40147, 6.0322156, 9.745437, 4.1603003]
2025-08-07 05:31:21,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 29.0, 29.0, 24.0, 27.0, 22.0, 18.0, 16.0, 25.0, 18.0]
2025-08-07 05:31:21,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 21 seconds)
2025-08-07 05:33:00,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:33:01,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.77489 ± 5.729
2025-08-07 05:33:01,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [11.369355, 4.073024, 3.338908, 1.4281299, 10.055545, 15.263698, 4.791608, 4.485083, 3.308978, 19.634539]
2025-08-07 05:33:01,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 14.0, 16.0, 13.0, 26.0, 31.0, 31.0, 24.0, 20.0, 37.0]
2025-08-07 05:33:01,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 42 seconds)
2025-08-07 05:34:43,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:45,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 177.39154 ± 230.316
2025-08-07 05:34:45,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.0769615, 394.23395, 12.530929, 5.512165, 7.2837806, 631.4976, 497.09473, 4.803779, 208.51724, 6.3643093]
2025-08-07 05:34:45,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 192.0, 23.0, 17.0, 21.0, 326.0, 228.0, 19.0, 124.0, 23.0]
2025-08-07 05:34:45,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (177.39) for latency ExtremeSparseL4U32
2025-08-07 05:34:45,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 37 seconds)
2025-08-07 05:36:26,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 12.02563 ± 10.498
2025-08-07 05:36:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [18.16275, 2.6590176, 4.1987348, 6.0494714, 38.05273, 9.856489, 1.7979021, 5.5019746, 17.75959, 16.217602]
2025-08-07 05:36:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 21.0, 19.0, 22.0, 77.0, 21.0, 17.0, 20.0, 30.0, 32.0]
2025-08-07 05:36:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 13 seconds)
2025-08-07 05:38:08,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:38:09,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 38.19914 ± 88.306
2025-08-07 05:38:09,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.22536354, 3.66657, -5.956696, 13.521359, 9.798227, 11.270345, 31.524534, 301.6362, 6.571366, 9.734096]
2025-08-07 05:38:09,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 13.0, 25.0, 28.0, 24.0, 21.0, 62.0, 175.0, 32.0, 22.0]
2025-08-07 05:38:09,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 12 minutes, 53 seconds)
2025-08-07 05:39:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:52,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 41.90035 ± 77.993
2025-08-07 05:39:52,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.3327283, 1.3792112, 5.5158954, 187.22122, 4.2154984, 3.9671118, 3.7703218, 3.7528808, 207.9366, -0.08799304]
2025-08-07 05:39:52,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 23.0, 159.0, 16.0, 21.0, 29.0, 17.0, 134.0, 16.0]
2025-08-07 05:39:52,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 38 seconds)
2025-08-07 05:41:34,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:41:35,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 61.32400 ± 99.158
2025-08-07 05:41:35,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [36.635387, 232.60231, 1.0912406, 5.4130416, 7.0120125, 282.06705, -0.19136538, 10.537688, 20.828957, 17.24367]
2025-08-07 05:41:35,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 149.0, 15.0, 32.0, 22.0, 185.0, 13.0, 20.0, 77.0, 31.0]
2025-08-07 05:41:35,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 10 minutes, 16 seconds)
2025-08-07 05:43:18,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:43:19,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 61.49675 ± 115.844
2025-08-07 05:43:19,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.160918, 4.273384, 1.1403294, 245.53645, 1.729577, 2.2775548, 333.89645, 9.1606655, 6.7669187, 1.025229]
2025-08-07 05:43:19,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 13.0, 155.0, 11.0, 13.0, 199.0, 23.0, 26.0, 15.0]
2025-08-07 05:43:19,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 35 seconds)
2025-08-07 05:45:02,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:45:02,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 30.76589 ± 56.814
2025-08-07 05:45:02,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [14.235178, 26.513416, 18.344614, 3.7458737, 12.820352, 5.814119, 5.6143417, 9.392828, 200.11816, 11.060039]
2025-08-07 05:45:02,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 82.0, 32.0, 15.0, 32.0, 21.0, 20.0, 21.0, 123.0, 26.0]
2025-08-07 05:45:02,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 3 seconds)
2025-08-07 05:46:44,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:46:45,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 80.05933 ± 136.064
2025-08-07 05:46:45,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-2.191259, 6.690337, 83.95152, 6.666062, 409.47925, 272.46133, 3.1575446, 6.7002907, 2.964532, 10.713697]
2025-08-07 05:46:45,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 29.0, 142.0, 17.0, 183.0, 175.0, 15.0, 20.0, 26.0, 23.0]
2025-08-07 05:46:45,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 23 seconds)
2025-08-07 05:48:28,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:48:29,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 76.53370 ± 160.114
2025-08-07 05:48:29,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.8365066, 8.579833, 3.0958352, -0.41088346, 7.15253, 1.8307778, 516.1805, 219.83194, 2.6759667, 4.5639296]
2025-08-07 05:48:29,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 21.0, 17.0, 20.0, 19.0, 17.0, 292.0, 139.0, 21.0, 16.0]
2025-08-07 05:48:29,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 44 seconds)
2025-08-07 05:50:12,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:50:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 118.38661 ± 155.467
2025-08-07 05:50:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [36.83967, 390.44888, 7.2457523, 7.702752, 58.269016, 3.7226393, 323.45956, 346.1947, 5.0113325, 4.971695]
2025-08-07 05:50:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [109.0, 189.0, 18.0, 21.0, 106.0, 27.0, 170.0, 174.0, 19.0, 18.0]
2025-08-07 05:50:13,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 12 seconds)
2025-08-07 05:51:56,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:51:56,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 41.65236 ± 72.021
2025-08-07 05:51:56,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [12.989588, 209.72104, 157.53099, 3.2145889, 5.3759923, 7.355397, 1.3552204, 10.059011, 7.3147044, 1.6070404]
2025-08-07 05:51:56,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 122.0, 113.0, 23.0, 16.0, 23.0, 12.0, 30.0, 28.0, 27.0]
2025-08-07 05:51:56,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 20 seconds)
2025-08-07 05:53:39,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:53:40,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 77.20253 ± 159.074
2025-08-07 05:53:40,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.090122, 521.73065, 2.3287766, 14.424217, 4.715116, 5.7458115, 6.7714868, 200.10907, 5.9730916, 5.13688]
2025-08-07 05:53:40,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 295.0, 18.0, 26.0, 15.0, 15.0, 31.0, 115.0, 32.0, 27.0]
2025-08-07 05:53:40,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 42 seconds)
2025-08-07 05:55:24,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:55:25,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 47.66396 ± 85.499
2025-08-07 05:55:25,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [91.452255, 63.93066, 4.434686, 5.0604687, 3.242913, 3.9669654, 4.0190687, 288.4527, 4.7607102, 7.3192277]
2025-08-07 05:55:25,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 99.0, 17.0, 15.0, 26.0, 18.0, 17.0, 143.0, 22.0, 21.0]
2025-08-07 05:55:25,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 7 seconds)
2025-08-07 05:57:07,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:57:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 69.22018 ± 188.986
2025-08-07 05:57:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [10.366963, 17.85885, 13.1415825, -1.1758567, 4.5512714, 635.8863, 0.2438092, 9.868833, 0.8993312, 0.56074846]
2025-08-07 05:57:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 25.0, 26.0, 23.0, 23.0, 292.0, 19.0, 21.0, 13.0, 26.0]
2025-08-07 05:57:08,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 21 seconds)
2025-08-07 05:58:53,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:58:54,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 45.91688 ± 118.074
2025-08-07 05:58:54,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.1777215, 1.8350534, 5.199384, -0.78416395, 5.248251, 399.83466, 2.6756, 13.205639, 16.120865, 9.655753]
2025-08-07 05:58:54,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 17.0, 14.0, 17.0, 253.0, 29.0, 30.0, 29.0, 29.0]
2025-08-07 05:58:54,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 49 seconds)
2025-08-07 06:00:33,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:00:34,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 18.13395 ± 42.960
2025-08-07 06:00:34,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.9045396, 7.0833044, -2.0724492, 3.0460234, 1.2257643, 145.82378, 18.580181, 8.196254, 0.675855, -0.31472093]
2025-08-07 06:00:34,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 23.0, 21.0, 13.0, 16.0, 134.0, 84.0, 33.0, 19.0, 16.0]
2025-08-07 06:00:34,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 42 seconds)
2025-08-07 06:02:17,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:02:18,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 33.22777 ± 68.800
2025-08-07 06:02:18,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.998118, 8.475137, 233.92615, 2.5331416, 3.783431, 58.177654, -0.729904, 4.879836, 6.351571, 6.8825784]
2025-08-07 06:02:18,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 35.0, 135.0, 30.0, 20.0, 128.0, 25.0, 16.0, 18.0, 18.0]
2025-08-07 06:02:18,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes)
2025-08-07 06:04:01,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:04:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 60.38690 ± 108.964
2025-08-07 06:04:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.65666896, 6.134458, 290.39517, 12.089707, 3.660397, 5.550302, 6.6911864, 5.5079727, 7.6559396, 265.52722]
2025-08-07 06:04:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 18.0, 152.0, 27.0, 14.0, 16.0, 19.0, 32.0, 21.0, 136.0]
2025-08-07 06:04:02,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 15 seconds)
2025-08-07 06:05:45,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:47,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 134.08749 ± 227.217
2025-08-07 06:05:47,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [409.7407, 13.495008, 4.870903, 17.382467, 8.005191, 7.3945327, 159.40654, 708.56995, 5.995538, 6.0142794]
2025-08-07 06:05:47,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 32.0, 17.0, 29.0, 21.0, 27.0, 125.0, 469.0, 19.0, 19.0]
2025-08-07 06:05:47,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 38 seconds)
2025-08-07 06:07:29,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:07:30,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 5.02145 ± 5.149
2025-08-07 06:07:30,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.938702, 3.1268175, 2.8783438, -4.3713846, 7.281902, 3.5174096, 10.641281, 3.1389086, 14.540949, 0.5216102]
2025-08-07 06:07:30,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 14.0, 23.0, 28.0, 27.0, 20.0, 18.0, 35.0, 25.0, 26.0]
2025-08-07 06:07:30,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 40 seconds)
2025-08-07 06:09:12,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:09:13,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 81.87892 ± 152.384
2025-08-07 06:09:13,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.257841, 11.30465, 10.451606, 2.7119436, 8.81238, 422.13156, -1.6835495, 10.3033085, 347.26746, 4.2320514]
2025-08-07 06:09:13,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 22.0, 20.0, 29.0, 31.0, 237.0, 11.0, 28.0, 174.0, 30.0]
2025-08-07 06:09:13,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 15 seconds)
2025-08-07 06:10:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:10:57,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 16.33828 ± 34.087
2025-08-07 06:10:57,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.1030016, 9.626364, -3.255864, -0.27940783, 3.0944192, 2.793724, 117.68997, 13.340303, 7.415132, 6.8551726]
2025-08-07 06:10:57,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 32.0, 29.0, 23.0, 31.0, 16.0, 88.0, 26.0, 22.0, 23.0]
2025-08-07 06:10:57,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 31 seconds)
2025-08-07 06:12:39,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:12:40,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 54.96477 ± 119.805
2025-08-07 06:12:40,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.835339, 128.54092, 7.2379937, -3.0509436, 1.1216881, -2.034263, 8.187701, 6.961945, 3.5936985, 396.25363]
2025-08-07 06:12:40,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 169.0, 19.0, 25.0, 24.0, 29.0, 21.0, 30.0, 22.0, 212.0]
2025-08-07 06:12:40,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 42 seconds)
2025-08-07 06:14:24,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:14:25,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 36.43145 ± 85.762
2025-08-07 06:14:25,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [292.57764, 28.499758, -4.7829814, 12.195775, 5.2000623, 2.0815985, 7.160477, 6.976927, 8.584783, 5.820532]
2025-08-07 06:14:25,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 84.0, 26.0, 25.0, 19.0, 17.0, 20.0, 18.0, 19.0, 22.0]
2025-08-07 06:14:25,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes)
2025-08-07 06:16:06,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:16:06,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 47.21397 ± 120.049
2025-08-07 06:16:06,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.315666, 407.17728, 12.453411, 12.041985, 4.518607, 2.1761045, 8.097817, 5.3645864, 0.5773695, 10.416838]
2025-08-07 06:16:06,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 200.0, 29.0, 33.0, 21.0, 21.0, 25.0, 22.0, 19.0, 21.0]
2025-08-07 06:16:06,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 10 seconds)
2025-08-07 06:17:51,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:17:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.24785 ± 2.502
2025-08-07 06:17:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.92851, 7.85681, 9.798539, 3.676813, 7.626795, 3.2710536, 5.9375725, 6.4112763, 6.603593, 11.367578]
2025-08-07 06:17:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 25.0, 20.0, 18.0, 14.0, 30.0, 20.0, 19.0, 22.0]
2025-08-07 06:17:51,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 34 seconds)
2025-08-07 06:19:32,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:19:33,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 58.68774 ± 134.751
2025-08-07 06:19:33,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.6519101, 0.7005655, 2.9605396, 454.48752, -2.8218048, 94.54009, 5.515843, 7.040322, 3.4394019, 20.363005]
2025-08-07 06:19:33,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 14.0, 26.0, 240.0, 25.0, 134.0, 17.0, 19.0, 17.0, 65.0]
2025-08-07 06:19:33,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 42 seconds)
2025-08-07 06:21:20,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:21:21,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 104.42351 ± 197.512
2025-08-07 06:21:21,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.439004, 1.7509513, 4.2118807, 16.78033, 2.4001095, 9.780474, 573.42755, 19.73347, 1.9722306, 411.73904]
2025-08-07 06:21:21,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 21.0, 17.0, 30.0, 12.0, 23.0, 535.0, 31.0, 14.0, 217.0]
2025-08-07 06:21:21,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 17 seconds)
2025-08-07 06:23:01,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:23:03,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 191.56374 ± 383.580
2025-08-07 06:23:03,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.104281, 0.47894305, 746.58484, 1.187638, 2.4535758, 8.040565, 19.775305, 0.24231942, -0.7827574, 1131.5526]
2025-08-07 06:23:03,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 31.0, 400.0, 16.0, 17.0, 36.0, 29.0, 30.0, 21.0, 619.0]
2025-08-07 06:23:03,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (191.56) for latency ExtremeSparseL4U32
2025-08-07 06:23:03,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 22 seconds)
2025-08-07 06:24:46,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:24:47,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 18.37189 ± 37.709
2025-08-07 06:24:47,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.758541, 0.3695216, 8.764404, 5.2737064, -0.34625182, 131.06741, 8.41473, 10.050996, 8.515924, 5.8499303]
2025-08-07 06:24:47,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 17.0, 19.0, 31.0, 14.0, 94.0, 32.0, 22.0, 30.0, 25.0]
2025-08-07 06:24:47,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 45 seconds)
2025-08-07 06:26:29,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:26:30,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 42.33789 ± 101.426
2025-08-07 06:26:30,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.4764023, 5.4176297, 7.160849, 7.431581, 15.835895, 6.006996, 20.566507, 1.3165915, 346.20636, 5.96012]
2025-08-07 06:26:30,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 17.0, 21.0, 23.0, 31.0, 27.0, 30.0, 13.0, 164.0, 16.0]
2025-08-07 06:26:30,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 55 seconds)
2025-08-07 06:28:12,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:14,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 134.71931 ± 177.010
2025-08-07 06:28:14,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.545737, 3.7181187, 515.7353, 343.70493, 3.05872, 5.685905, 8.277479, 287.9489, 166.04913, 4.468968]
2025-08-07 06:28:14,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 26.0, 276.0, 199.0, 26.0, 28.0, 19.0, 203.0, 89.0, 17.0]
2025-08-07 06:28:14,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 17 seconds)
2025-08-07 06:29:57,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:29:58,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 24.68289 ± 47.593
2025-08-07 06:29:58,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [13.521188, 9.462214, 21.649567, 12.443452, 1.3880012, 6.3535995, 1.6102846, 4.594877, 9.391557, 166.4142]
2025-08-07 06:29:58,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 21.0, 33.0, 25.0, 14.0, 21.0, 13.0, 17.0, 24.0, 264.0]
2025-08-07 06:29:58,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 23 seconds)
2025-08-07 06:31:40,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:31:41,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 69.90177 ± 112.580
2025-08-07 06:31:41,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.68569, 186.55045, 9.200119, 6.614174, 349.7776, -2.2662904, 2.237356, 135.17027, 3.4685862, 5.57976]
2025-08-07 06:31:41,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 133.0, 19.0, 27.0, 173.0, 9.0, 13.0, 89.0, 19.0, 30.0]
2025-08-07 06:31:41,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 42 seconds)
2025-08-07 06:33:25,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:33:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 157.35516 ± 201.061
2025-08-07 06:33:27,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.50025445, 5.112834, 40.60121, 166.78358, 6.1568966, 483.89178, 9.788705, 3.449859, 530.168, 327.0985]
2025-08-07 06:33:27,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 21.0, 85.0, 210.0, 22.0, 290.0, 20.0, 16.0, 288.0, 169.0]
2025-08-07 06:33:27,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 3 seconds)
2025-08-07 06:35:09,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:35:10,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 7.72872 ± 6.648
2025-08-07 06:35:10,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.8211482, -1.2659055, 11.815166, 4.181964, 13.435969, 0.17949389, 20.927748, 3.9439132, 8.092293, 13.155388]
2025-08-07 06:35:10,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 23.0, 19.0, 18.0, 26.0, 21.0, 163.0, 19.0, 30.0, 32.0]
2025-08-07 06:35:10,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 19 seconds)
2025-08-07 06:36:52,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:36:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 47.73521 ± 105.699
2025-08-07 06:36:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.8534684, -1.3770703, 29.835052, 56.868996, -0.5638955, 8.527912, 10.190668, 360.54544, 12.409237, -4.937788]
2025-08-07 06:36:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 88.0, 96.0, 26.0, 19.0, 22.0, 173.0, 29.0, 25.0]
2025-08-07 06:36:53,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 33 seconds)
2025-08-07 06:38:35,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:38:37,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 136.24271 ± 267.146
2025-08-07 06:38:37,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.7527804, 775.70166, 18.29065, -3.2747898, 3.500818, 1.0063034, 7.68602, 0.7948184, 6.933575, 545.03516]
2025-08-07 06:38:37,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 400.0, 29.0, 23.0, 30.0, 15.0, 24.0, 13.0, 26.0, 261.0]
2025-08-07 06:38:37,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 49 seconds)
2025-08-07 06:40:19,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:40:20,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 30.66934 ± 71.746
2025-08-07 06:40:20,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.2761016, 10.655821, 13.378095, 5.602477, 2.992203, 8.458215, 245.66946, 7.0671563, 5.7265506, 0.8673018]
2025-08-07 06:40:20,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 32.0, 18.0, 13.0, 30.0, 190.0, 22.0, 17.0, 14.0]
2025-08-07 06:40:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 6 seconds)
2025-08-07 06:42:02,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:42:03,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 55.68163 ± 104.492
2025-08-07 06:42:03,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [8.074332, 175.56345, 329.9486, 0.6466616, 5.5411663, 14.210665, 5.984, -0.92445636, 6.759461, 11.012443]
2025-08-07 06:42:03,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 114.0, 171.0, 19.0, 18.0, 31.0, 27.0, 26.0, 19.0, 26.0]
2025-08-07 06:42:03,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 19 seconds)
2025-08-07 06:43:46,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:43:47,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 48.77739 ± 125.265
2025-08-07 06:43:47,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [11.589856, 1.8000858, 14.11315, 4.514246, 3.1418045, 3.8890014, 424.35455, 6.9161205, 3.9463148, 13.508818]
2025-08-07 06:43:47,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 17.0, 23.0, 17.0, 16.0, 15.0, 224.0, 35.0, 18.0, 23.0]
2025-08-07 06:43:47,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 37 seconds)
2025-08-07 06:45:30,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:45:30,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 61.42355 ± 131.287
2025-08-07 06:45:30,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.4490213, 9.223478, 4.625998, 10.228516, -2.0120716, 7.240586, 439.01538, 3.3783147, 130.40234, 5.6839056]
2025-08-07 06:45:30,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 20.0, 16.0, 27.0, 14.0, 20.0, 209.0, 14.0, 90.0, 18.0]
2025-08-07 06:45:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 54 seconds)
2025-08-07 06:47:09,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:47:10,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 49.51673 ± 131.597
2025-08-07 06:47:10,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [444.14706, 1.2578899, 0.70188063, 10.829277, 11.989293, 7.109935, 1.7120105, 8.562407, 4.0162444, 4.8413053]
2025-08-07 06:47:10,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 28.0, 30.0, 31.0, 26.0, 21.0, 18.0, 20.0, 17.0, 21.0]
2025-08-07 06:47:10,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 7 seconds)
2025-08-07 06:48:51,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:48:52,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 64.09618 ± 174.109
2025-08-07 06:48:52,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.106237, 3.7323518, 17.768097, 3.3933563, 586.256, 3.120113, 2.317223, 5.3195558, 6.174116, 9.7746725]
2025-08-07 06:48:52,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 17.0, 33.0, 16.0, 311.0, 30.0, 23.0, 15.0, 22.0, 22.0]
2025-08-07 06:48:52,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 24 seconds)
2025-08-07 06:50:32,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:50:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 72.51283 ± 116.153
2025-08-07 06:50:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.841453, -0.17580295, 201.36682, 8.185419, 0.26728415, 362.22192, 19.023138, 1.075325, 123.51927, 2.8034787]
2025-08-07 06:50:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 139.0, 24.0, 26.0, 215.0, 33.0, 25.0, 101.0, 22.0]
2025-08-07 06:50:33,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2025-08-07 06:52:16,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:52:17,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 37.75940 ± 84.013
2025-08-07 06:52:17,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.9729156, 1.2820435, 5.302752, 43.658085, 5.123011, 286.9788, 21.101862, 0.15501732, 2.3248591, 6.694697]
2025-08-07 06:52:17,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 27.0, 21.0, 88.0, 26.0, 142.0, 26.0, 14.0, 33.0, 24.0]
2025-08-07 06:52:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1251 [DEBUG]: Training session finished
