2025-08-07 03:45:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:45:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:45:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1524cbf87d50>}
2025-08-07 03:45:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 03:45:32,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 03:45:32,569 baseline-bpql-noiseperc5-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 03:45:32,569 baseline-bpql-noiseperc5-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:45:33,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 03:45:33,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 03:47:11,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 64.69663 ± 32.042
2025-08-07 03:47:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [21.419344, 19.518667, 89.2634, 81.72264, 83.63908, 59.26892, 130.4346, 41.34275, 69.21926, 51.13767]
2025-08-07 03:47:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 97.0, 122.0, 98.0, 83.0, 130.0, 56.0, 69.0, 59.0]
2025-08-07 03:47:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (64.70) for latency ExtremeSparseL4U32
2025-08-07 03:47:12,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 43 minutes, 17 seconds)
2025-08-07 03:48:57,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:58,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 23.80809 ± 14.173
2025-08-07 03:48:58,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [25.695816, 28.283062, 18.834978, 52.494656, 27.88307, 0.37370825, 21.879559, 20.362558, 37.5956, 4.677881]
2025-08-07 03:48:58,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 131.0, 30.0, 135.0, 180.0, 130.0, 32.0, 42.0, 63.0, 131.0]
2025-08-07 03:48:58,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 47 minutes, 27 seconds)
2025-08-07 03:50:45,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:47,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 14.63337 ± 22.706
2025-08-07 03:50:47,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [10.376499, -5.929241, 39.93309, 12.541396, -11.277969, -14.854769, 54.416954, 20.17558, 41.06088, -0.10870247]
2025-08-07 03:50:47,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 137.0, 72.0, 112.0, 146.0, 132.0, 121.0, 174.0, 76.0, 177.0]
2025-08-07 03:50:47,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 49 minutes, 8 seconds)
2025-08-07 03:52:31,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:32,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 29.21449 ± 10.954
2025-08-07 03:52:32,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [41.10719, 24.075447, 53.660046, 19.442253, 35.441677, 32.692142, 22.923637, 15.990769, 25.175228, 21.636469]
2025-08-07 03:52:32,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [77.0, 105.0, 64.0, 32.0, 72.0, 47.0, 36.0, 133.0, 118.0, 33.0]
2025-08-07 03:52:32,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 47 minutes, 42 seconds)
2025-08-07 03:54:17,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:18,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 24.90345 ± 11.013
2025-08-07 03:54:18,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [14.131877, 13.193902, 23.484179, 10.714502, 50.814064, 28.289852, 25.07746, 22.93593, 27.580875, 32.81184]
2025-08-07 03:54:18,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 47.0, 21.0, 103.0, 73.0, 49.0, 37.0, 41.0, 155.0]
2025-08-07 03:54:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 46 minutes, 2 seconds)
2025-08-07 03:56:02,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:03,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 42.75217 ± 26.955
2025-08-07 03:56:03,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [29.881065, 117.5305, 49.42531, 24.018589, 57.03748, 30.157068, 36.08883, 32.56313, 25.08981, 25.7299]
2025-08-07 03:56:03,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 149.0, 88.0, 64.0, 114.0, 64.0, 59.0, 40.0, 117.0, 43.0]
2025-08-07 03:56:03,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 46 minutes, 24 seconds)
2025-08-07 03:57:48,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:57:50,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 75.34452 ± 47.289
2025-08-07 03:57:50,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [168.74356, 52.53108, 26.702942, 62.086296, 57.152954, 51.565453, 133.61975, 130.56284, 38.291805, 32.188564]
2025-08-07 03:57:50,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 153.0, 51.0, 76.0, 64.0, 73.0, 158.0, 254.0, 144.0, 43.0]
2025-08-07 03:57:50,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (75.34) for latency ExtremeSparseL4U32
2025-08-07 03:57:50,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 44 minutes, 49 seconds)
2025-08-07 03:59:36,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 88.13906 ± 71.947
2025-08-07 03:59:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [15.443171, 49.835556, 229.69391, 114.56611, 60.097725, 85.139084, 67.7912, 20.873457, 210.18929, 27.761177]
2025-08-07 03:59:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 64.0, 162.0, 183.0, 80.0, 97.0, 89.0, 33.0, 152.0, 40.0]
2025-08-07 03:59:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (88.14) for latency ExtremeSparseL4U32
2025-08-07 03:59:37,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 42 minutes, 39 seconds)
2025-08-07 04:01:23,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:24,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 70.26955 ± 51.560
2025-08-07 04:01:24,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [32.357925, 38.313587, 35.527702, 78.006355, 206.4315, 70.40281, 83.090775, 41.613167, 97.561554, 19.39014]
2025-08-07 04:01:24,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 55.0, 50.0, 96.0, 164.0, 102.0, 218.0, 63.0, 118.0, 32.0]
2025-08-07 04:01:24,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 41 minutes, 19 seconds)
2025-08-07 04:03:08,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:10,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 99.83430 ± 96.768
2025-08-07 04:03:10,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [235.88177, 122.84299, 18.346632, 17.54787, 27.110157, 26.384342, 260.2969, 40.440865, 224.72119, 24.770226]
2025-08-07 04:03:10,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 209.0, 31.0, 41.0, 38.0, 39.0, 161.0, 51.0, 127.0, 36.0]
2025-08-07 04:03:10,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (99.83) for latency ExtremeSparseL4U32
2025-08-07 04:03:10,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 39 minutes, 38 seconds)
2025-08-07 04:04:53,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 85.09355 ± 62.260
2025-08-07 04:04:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [31.303072, 29.23632, 38.79711, 92.487785, 179.88857, 188.89165, 29.156616, 41.716434, 61.943813, 157.51425]
2025-08-07 04:04:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [39.0, 42.0, 52.0, 78.0, 133.0, 142.0, 38.0, 50.0, 66.0, 120.0]
2025-08-07 04:04:55,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 37 minutes, 38 seconds)
2025-08-07 04:06:40,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 122.81844 ± 78.720
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [22.432796, 209.17448, 67.12848, 75.55242, 101.07336, 203.06349, 61.85047, 227.0326, 36.9171, 223.9591]
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 122.0, 63.0, 77.0, 99.0, 127.0, 58.0, 136.0, 58.0, 155.0]
2025-08-07 04:06:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (122.82) for latency ExtremeSparseL4U32
2025-08-07 04:06:42,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 35 minutes, 58 seconds)
2025-08-07 04:08:25,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 160.72728 ± 92.855
2025-08-07 04:08:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [129.99525, 25.523031, 52.80309, 212.70448, 174.06136, 217.95894, 218.92258, 30.529078, 316.6513, 228.12361]
2025-08-07 04:08:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [106.0, 40.0, 86.0, 119.0, 99.0, 131.0, 138.0, 40.0, 227.0, 147.0]
2025-08-07 04:08:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (160.73) for latency ExtremeSparseL4U32
2025-08-07 04:08:27,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 33 minutes, 35 seconds)
2025-08-07 04:10:12,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:13,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 98.07156 ± 75.561
2025-08-07 04:10:13,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [80.83984, 16.540173, 15.350273, 116.835495, 174.85191, 168.7192, 15.865273, 218.77786, 158.83005, 14.105502]
2025-08-07 04:10:13,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [89.0, 30.0, 27.0, 88.0, 112.0, 106.0, 27.0, 129.0, 133.0, 24.0]
2025-08-07 04:10:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 31 minutes, 41 seconds)
2025-08-07 04:11:58,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 98.25851 ± 66.103
2025-08-07 04:11:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [61.891888, 110.45178, 56.444218, 186.87807, 95.26881, 9.115529, 244.59561, 49.870075, 99.96147, 68.107635]
2025-08-07 04:11:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 115.0, 62.0, 141.0, 89.0, 21.0, 152.0, 60.0, 113.0, 72.0]
2025-08-07 04:11:59,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 30 minutes, 4 seconds)
2025-08-07 04:13:45,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:47,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 188.92288 ± 86.993
2025-08-07 04:13:47,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [197.4146, 225.30032, 295.27338, 138.96057, 339.02032, 261.58362, 56.08463, 82.17497, 141.1684, 152.24811]
2025-08-07 04:13:47,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 132.0, 173.0, 95.0, 289.0, 152.0, 68.0, 69.0, 110.0, 124.0]
2025-08-07 04:13:47,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (188.92) for latency ExtremeSparseL4U32
2025-08-07 04:13:47,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 29 minutes, 10 seconds)
2025-08-07 04:15:32,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:15:34,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 139.62033 ± 101.267
2025-08-07 04:15:34,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [244.93335, 26.110046, 45.537792, 81.17855, 244.36676, 149.02809, 39.896942, 232.84183, 33.481033, 298.82883]
2025-08-07 04:15:34,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 43.0, 48.0, 122.0, 156.0, 117.0, 46.0, 145.0, 47.0, 173.0]
2025-08-07 04:15:34,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 27 minutes, 8 seconds)
2025-08-07 04:17:19,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:17:22,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 236.30795 ± 131.597
2025-08-07 04:17:22,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [442.67334, 116.841736, 303.145, 376.1891, 126.2116, 141.391, 389.0464, 219.56537, 229.65398, 18.362007]
2025-08-07 04:17:22,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 219.0, 158.0, 223.0, 142.0, 249.0, 194.0, 184.0, 345.0, 30.0]
2025-08-07 04:17:22,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (236.31) for latency ExtremeSparseL4U32
2025-08-07 04:17:22,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 26 minutes, 16 seconds)
2025-08-07 04:19:09,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:19:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 175.64186 ± 97.931
2025-08-07 04:19:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [38.19529, 98.00366, 204.67917, 19.725601, 260.6961, 285.00912, 283.37866, 286.8485, 132.42455, 147.45784]
2025-08-07 04:19:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [42.0, 138.0, 230.0, 32.0, 241.0, 171.0, 158.0, 136.0, 157.0, 193.0]
2025-08-07 04:19:12,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 25 minutes, 19 seconds)
2025-08-07 04:20:56,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:57,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 72.18341 ± 55.420
2025-08-07 04:20:57,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [118.59318, 18.336817, 102.62903, 84.19109, 34.122868, 44.76627, 208.32138, 33.02622, 46.316437, 31.530857]
2025-08-07 04:20:57,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 30.0, 108.0, 107.0, 44.0, 51.0, 133.0, 42.0, 56.0, 41.0]
2025-08-07 04:20:57,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 23 minutes, 27 seconds)
2025-08-07 04:22:43,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:22:45,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 256.94992 ± 136.943
2025-08-07 04:22:45,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [518.4277, 269.27426, 272.6446, 345.20798, 178.47276, 265.947, 77.98084, 231.33841, 387.68808, 22.517845]
2025-08-07 04:22:45,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [317.0, 162.0, 275.0, 176.0, 166.0, 160.0, 112.0, 119.0, 190.0, 35.0]
2025-08-07 04:22:45,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (256.95) for latency ExtremeSparseL4U32
2025-08-07 04:22:45,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 21 minutes, 38 seconds)
2025-08-07 04:24:31,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:24:34,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 274.33389 ± 193.837
2025-08-07 04:24:34,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [400.48068, 551.02454, 14.473457, 15.3442335, 409.73734, 357.64728, 314.0229, 131.73944, 49.187996, 499.68097]
2025-08-07 04:24:34,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [228.0, 296.0, 23.0, 27.0, 196.0, 154.0, 146.0, 150.0, 52.0, 273.0]
2025-08-07 04:24:34,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (274.33) for latency ExtremeSparseL4U32
2025-08-07 04:24:34,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 20 minutes, 29 seconds)
2025-08-07 04:26:19,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:22,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 254.46187 ± 193.561
2025-08-07 04:26:22,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [38.543217, 240.00513, 710.21515, 21.009968, 120.05706, 131.94391, 220.53325, 327.25473, 351.34964, 383.7066]
2025-08-07 04:26:22,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [52.0, 187.0, 521.0, 32.0, 145.0, 136.0, 123.0, 158.0, 168.0, 177.0]
2025-08-07 04:26:22,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 18 minutes, 24 seconds)
2025-08-07 04:28:07,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:09,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 169.17607 ± 136.196
2025-08-07 04:28:09,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [281.71884, 77.15039, 161.60437, 345.05963, 32.61672, 21.67623, 96.96793, 240.95607, 19.496367, 414.51425]
2025-08-07 04:28:09,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 69.0, 147.0, 212.0, 44.0, 32.0, 123.0, 127.0, 30.0, 206.0]
2025-08-07 04:28:09,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 16 minutes, 10 seconds)
2025-08-07 04:29:54,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:29:56,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 156.87964 ± 148.904
2025-08-07 04:29:56,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [49.630802, 310.36133, 115.13022, 65.4684, 371.9407, 443.28757, 42.180874, 16.671204, 114.0082, 40.11707]
2025-08-07 04:29:56,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 170.0, 135.0, 65.0, 218.0, 234.0, 57.0, 28.0, 161.0, 50.0]
2025-08-07 04:29:56,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 14 minutes, 38 seconds)
2025-08-07 04:31:41,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:31:43,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 182.73047 ± 230.929
2025-08-07 04:31:43,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [54.946938, 148.98024, 50.066994, 22.443447, 45.341812, 17.922644, 238.40758, 491.28885, 24.57254, 733.3338]
2025-08-07 04:31:43,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [60.0, 128.0, 56.0, 33.0, 55.0, 28.0, 231.0, 292.0, 41.0, 360.0]
2025-08-07 04:31:43,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 12 minutes, 41 seconds)
2025-08-07 04:33:30,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:33:32,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 199.80383 ± 180.356
2025-08-07 04:33:32,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [230.94522, 510.81088, 159.22556, 529.5744, 50.640835, 43.524654, 12.543916, 203.81744, 16.090075, 240.86534]
2025-08-07 04:33:32,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 244.0, 195.0, 249.0, 60.0, 51.0, 33.0, 177.0, 26.0, 266.0]
2025-08-07 04:33:32,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 10 minutes, 57 seconds)
2025-08-07 04:35:18,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:35:20,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 203.18777 ± 166.899
2025-08-07 04:35:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [191.7846, 464.20038, 35.586178, 226.50665, 24.59593, 32.546917, 255.49556, 484.8314, 289.5672, 26.762985]
2025-08-07 04:35:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 233.0, 48.0, 183.0, 33.0, 43.0, 163.0, 230.0, 343.0, 43.0]
2025-08-07 04:35:20,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 9 minutes, 16 seconds)
2025-08-07 04:37:06,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:37:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 59.62968 ± 87.399
2025-08-07 04:37:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [13.702025, 16.612494, 37.73856, 20.663506, 317.68716, 17.739008, 40.698822, 56.838356, 18.361403, 56.255463]
2025-08-07 04:37:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 30.0, 47.0, 32.0, 140.0, 30.0, 52.0, 56.0, 29.0, 59.0]
2025-08-07 04:37:06,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 7 minutes, 8 seconds)
2025-08-07 04:38:51,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:38:53,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 223.52310 ± 107.016
2025-08-07 04:38:53,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [270.8884, 308.07486, 120.574165, 270.10516, 366.72638, 247.38533, 347.3091, 35.667007, 182.90724, 85.59329]
2025-08-07 04:38:53,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 161.0, 111.0, 194.0, 184.0, 134.0, 173.0, 48.0, 175.0, 122.0]
2025-08-07 04:38:53,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 5 minutes, 24 seconds)
2025-08-07 04:40:39,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:40:41,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 167.95886 ± 143.622
2025-08-07 04:40:41,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [248.75677, 20.974821, 21.380314, 40.671196, 33.333668, 169.16563, 179.35687, 243.36714, 505.78635, 216.7959]
2025-08-07 04:40:41,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 33.0, 32.0, 52.0, 47.0, 136.0, 154.0, 151.0, 261.0, 190.0]
2025-08-07 04:40:41,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 3 minutes, 38 seconds)
2025-08-07 04:42:26,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:28,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 102.90952 ± 119.978
2025-08-07 04:42:28,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [199.92407, 19.767649, 23.862978, 42.563793, 158.03175, 26.338327, 413.26718, 25.932945, 21.951153, 97.45538]
2025-08-07 04:42:28,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 30.0, 35.0, 52.0, 143.0, 41.0, 227.0, 42.0, 42.0, 104.0]
2025-08-07 04:42:28,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 1 minute, 24 seconds)
2025-08-07 04:44:14,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:44:16,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 215.66306 ± 137.779
2025-08-07 04:44:16,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [277.19382, 346.25803, 17.190382, 196.37585, 435.4465, 266.24588, 254.06111, 41.32056, 18.836264, 303.70212]
2025-08-07 04:44:16,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 169.0, 31.0, 161.0, 175.0, 151.0, 202.0, 52.0, 31.0, 168.0]
2025-08-07 04:44:16,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 59 minutes, 38 seconds)
2025-08-07 04:46:02,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:46:04,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 208.13155 ± 135.654
2025-08-07 04:46:04,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [384.0665, 387.65442, 250.49805, 21.108139, 325.86392, 115.30656, 44.826992, 44.96081, 299.21057, 207.81947]
2025-08-07 04:46:04,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 204.0, 117.0, 30.0, 191.0, 123.0, 55.0, 57.0, 140.0, 174.0]
2025-08-07 04:46:04,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 58 minutes, 12 seconds)
2025-08-07 04:47:49,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:51,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 277.49496 ± 163.577
2025-08-07 04:47:51,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [236.77014, 455.64584, 16.116201, 251.47568, 463.88608, 105.29673, 425.28564, 382.16962, 44.24886, 394.05502]
2025-08-07 04:47:51,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 218.0, 31.0, 159.0, 284.0, 147.0, 246.0, 176.0, 49.0, 217.0]
2025-08-07 04:47:51,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (277.49) for latency ExtremeSparseL4U32
2025-08-07 04:47:51,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 56 minutes, 32 seconds)
2025-08-07 04:49:38,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:49:40,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 222.62854 ± 141.441
2025-08-07 04:49:40,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [346.01022, 398.68637, 22.284058, 420.02643, 211.34196, 248.81255, 27.724192, 226.03584, 282.54266, 42.82112]
2025-08-07 04:49:40,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 191.0, 31.0, 204.0, 171.0, 168.0, 43.0, 124.0, 155.0, 53.0]
2025-08-07 04:49:40,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 55 minutes, 3 seconds)
2025-08-07 04:51:25,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:51:28,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 205.53218 ± 156.721
2025-08-07 04:51:28,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [16.740555, 464.44208, 147.75937, 150.9801, 474.02454, 195.9629, 220.70567, 302.26678, 61.97025, 20.469545]
2025-08-07 04:51:28,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 330.0, 187.0, 190.0, 301.0, 116.0, 117.0, 311.0, 60.0, 30.0]
2025-08-07 04:51:28,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 53 minutes, 25 seconds)
2025-08-07 04:53:14,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:53:17,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 303.67914 ± 186.758
2025-08-07 04:53:17,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [371.25735, 204.38495, 436.60147, 483.61673, 281.05737, 15.010211, 280.21436, 302.31534, 646.135, 16.198256]
2025-08-07 04:53:17,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 102.0, 183.0, 222.0, 145.0, 25.0, 177.0, 151.0, 355.0, 26.0]
2025-08-07 04:53:17,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (303.68) for latency ExtremeSparseL4U32
2025-08-07 04:53:17,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 51 minutes, 43 seconds)
2025-08-07 04:55:02,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:04,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 193.24840 ± 164.401
2025-08-07 04:55:04,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [61.173306, 22.4975, 435.4365, 234.37732, 16.886929, 300.32727, 281.47357, 466.24802, 82.90438, 31.159176]
2025-08-07 04:55:04,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [64.0, 33.0, 215.0, 122.0, 29.0, 152.0, 139.0, 213.0, 128.0, 49.0]
2025-08-07 04:55:04,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 49 minutes, 50 seconds)
2025-08-07 04:56:50,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:56:52,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 174.94159 ± 152.038
2025-08-07 04:56:52,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [14.812852, 18.107763, 122.74074, 34.59185, 351.56534, 405.81702, 125.24445, 20.219154, 345.97403, 310.34277]
2025-08-07 04:56:52,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 32.0, 127.0, 48.0, 167.0, 159.0, 115.0, 31.0, 160.0, 151.0]
2025-08-07 04:56:52,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 48 minutes, 7 seconds)
2025-08-07 04:58:35,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:58:38,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 314.05103 ± 159.709
2025-08-07 04:58:38,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [255.8961, 330.62555, 323.95142, 310.26846, 295.7454, 31.368486, 327.9757, 372.68643, 703.4884, 188.5045]
2025-08-07 04:58:38,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 144.0, 154.0, 192.0, 159.0, 50.0, 183.0, 184.0, 424.0, 108.0]
2025-08-07 04:58:38,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (314.05) for latency ExtremeSparseL4U32
2025-08-07 04:58:38,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 45 minutes, 42 seconds)
2025-08-07 05:00:23,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 115.56986 ± 99.722
2025-08-07 05:00:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [25.356802, 194.01483, 308.96457, 65.85201, 181.63081, 36.734726, 37.763657, 238.1556, 43.23813, 23.987453]
2025-08-07 05:00:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 121.0, 147.0, 68.0, 134.0, 45.0, 55.0, 125.0, 54.0, 45.0]
2025-08-07 05:00:25,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 43 minutes, 49 seconds)
2025-08-07 05:02:11,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:02:13,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 244.36768 ± 163.439
2025-08-07 05:02:13,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [228.38945, 286.82944, 513.06335, 19.919792, 318.45667, 11.128331, 37.395473, 365.42117, 401.93793, 261.13535]
2025-08-07 05:02:13,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 138.0, 219.0, 32.0, 156.0, 23.0, 48.0, 209.0, 181.0, 130.0]
2025-08-07 05:02:13,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 41 minutes, 52 seconds)
2025-08-07 05:03:59,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:04:02,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 276.78790 ± 157.847
2025-08-07 05:04:02,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [20.700356, 371.2502, 244.17491, 248.82504, 339.19794, 396.11807, 570.10986, 264.96124, 15.833259, 296.70813]
2025-08-07 05:04:02,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 223.0, 260.0, 140.0, 164.0, 193.0, 318.0, 156.0, 27.0, 137.0]
2025-08-07 05:04:02,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 40 minutes, 22 seconds)
2025-08-07 05:05:49,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:51,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 234.96536 ± 128.144
2025-08-07 05:05:51,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [207.59758, 92.399666, 305.22775, 394.86835, 292.06757, 430.99988, 92.27897, 219.08057, 296.0349, 19.09837]
2025-08-07 05:05:51,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 111.0, 157.0, 200.0, 154.0, 200.0, 112.0, 117.0, 164.0, 28.0]
2025-08-07 05:05:51,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 38 minutes, 49 seconds)
2025-08-07 05:07:35,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:07:38,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 304.35492 ± 277.244
2025-08-07 05:07:38,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [380.97662, 16.898693, 449.3154, 19.842575, 35.44433, 943.1363, 362.47418, 41.03697, 334.50302, 459.92145]
2025-08-07 05:07:38,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 28.0, 211.0, 30.0, 54.0, 548.0, 165.0, 56.0, 181.0, 187.0]
2025-08-07 05:07:38,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 37 minutes, 17 seconds)
2025-08-07 05:09:23,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:09:25,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 244.20930 ± 150.550
2025-08-07 05:09:25,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [404.33856, 309.53598, 20.502037, 362.85184, 22.157307, 407.21765, 400.42688, 69.20202, 202.83908, 243.02179]
2025-08-07 05:09:25,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [163.0, 158.0, 29.0, 221.0, 32.0, 187.0, 197.0, 64.0, 101.0, 113.0]
2025-08-07 05:09:25,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 35 minutes, 26 seconds)
2025-08-07 05:11:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:11:14,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 272.06641 ± 151.405
2025-08-07 05:11:14,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [285.05762, 379.53992, 229.80907, 213.17482, 363.35052, 317.33463, 543.22015, 341.8961, 18.78719, 28.494251]
2025-08-07 05:11:14,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 192.0, 118.0, 136.0, 175.0, 149.0, 243.0, 167.0, 31.0, 45.0]
2025-08-07 05:11:14,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 33 minutes, 48 seconds)
2025-08-07 05:12:59,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:13:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 330.93927 ± 151.919
2025-08-07 05:13:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [451.37244, 558.90936, 18.55876, 321.4394, 335.16617, 206.56425, 545.10095, 283.242, 308.05655, 280.9827]
2025-08-07 05:13:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [237.0, 319.0, 27.0, 174.0, 163.0, 103.0, 272.0, 145.0, 141.0, 148.0]
2025-08-07 05:13:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (330.94) for latency ExtremeSparseL4U32
2025-08-07 05:13:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 31 minutes, 46 seconds)
2025-08-07 05:14:48,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:14:50,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 195.52544 ± 160.979
2025-08-07 05:14:50,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [265.68884, 387.9939, 131.86748, 485.73126, 324.79916, 235.26335, 72.35076, 13.579747, 20.68995, 17.289913]
2025-08-07 05:14:50,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 219.0, 134.0, 220.0, 152.0, 118.0, 116.0, 29.0, 32.0, 30.0]
2025-08-07 05:14:50,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 29 minutes, 46 seconds)
2025-08-07 05:16:35,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:16:36,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 164.20644 ± 145.625
2025-08-07 05:16:36,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [45.98834, 21.063854, 323.30734, 201.63576, 17.215384, 344.67654, 299.41382, 18.773533, 17.045912, 352.94388]
2025-08-07 05:16:36,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [55.0, 32.0, 224.0, 140.0, 30.0, 140.0, 145.0, 30.0, 30.0, 157.0]
2025-08-07 05:16:37,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 27 minutes, 56 seconds)
2025-08-07 05:18:22,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 241.90886 ± 137.489
2025-08-07 05:18:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [250.05597, 15.871915, 407.1493, 367.17163, 421.1846, 314.6456, 200.43445, 176.94809, 250.50578, 15.121427]
2025-08-07 05:18:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 32.0, 213.0, 177.0, 212.0, 160.0, 136.0, 122.0, 134.0, 29.0]
2025-08-07 05:18:24,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 26 minutes, 16 seconds)
2025-08-07 05:20:10,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:12,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 179.99394 ± 140.946
2025-08-07 05:20:12,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [203.50296, 229.89131, 221.89967, 174.36421, 48.50014, 79.06427, 15.600642, 17.605196, 327.55774, 481.95325]
2025-08-07 05:20:12,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [99.0, 105.0, 112.0, 101.0, 55.0, 76.0, 26.0, 30.0, 151.0, 200.0]
2025-08-07 05:20:12,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 24 minutes, 14 seconds)
2025-08-07 05:21:58,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:00,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 268.17859 ± 95.429
2025-08-07 05:22:00,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [363.36176, 246.03952, 290.76743, 40.09224, 209.22223, 206.05646, 287.63318, 386.877, 342.05658, 309.6795]
2025-08-07 05:22:00,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 124.0, 128.0, 52.0, 113.0, 110.0, 134.0, 160.0, 155.0, 160.0]
2025-08-07 05:22:00,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 22 minutes, 30 seconds)
2025-08-07 05:23:46,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:23:48,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 239.44897 ± 151.759
2025-08-07 05:23:48,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [432.80362, 20.38716, 232.75302, 399.22006, 278.75534, 342.86682, 32.09271, 16.810106, 289.62405, 349.17694]
2025-08-07 05:23:48,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [206.0, 31.0, 115.0, 197.0, 159.0, 208.0, 52.0, 27.0, 147.0, 208.0]
2025-08-07 05:23:48,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 20 minutes, 43 seconds)
2025-08-07 05:25:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:25:34,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 239.55107 ± 136.142
2025-08-07 05:25:34,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [229.23363, 42.497566, 218.20995, 522.08887, 322.1431, 28.679443, 346.35452, 194.74168, 271.72415, 219.83778]
2025-08-07 05:25:34,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 55.0, 116.0, 225.0, 151.0, 46.0, 189.0, 110.0, 154.0, 111.0]
2025-08-07 05:25:34,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 18 minutes, 51 seconds)
2025-08-07 05:27:20,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:27:22,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 186.10753 ± 109.798
2025-08-07 05:27:22,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [205.80725, 194.20897, 16.777863, 22.005163, 60.17518, 241.94067, 304.86636, 335.21936, 281.0795, 198.99507]
2025-08-07 05:27:22,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 118.0, 28.0, 32.0, 66.0, 126.0, 161.0, 247.0, 171.0, 108.0]
2025-08-07 05:27:22,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 17 minutes, 4 seconds)
2025-08-07 05:29:08,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:29:10,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 186.28122 ± 172.376
2025-08-07 05:29:10,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [215.79727, 42.67458, 33.65493, 22.062317, 58.90601, 156.08617, 627.00183, 236.02765, 177.31136, 293.2901]
2025-08-07 05:29:10,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 54.0, 49.0, 37.0, 59.0, 146.0, 286.0, 142.0, 147.0, 144.0]
2025-08-07 05:29:10,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 15 minutes, 25 seconds)
2025-08-07 05:30:55,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 267.39462 ± 135.639
2025-08-07 05:30:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [459.2427, 202.40512, 365.35318, 306.78958, 37.039692, 200.82431, 351.20648, 304.89258, 399.09818, 47.0944]
2025-08-07 05:30:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [206.0, 102.0, 214.0, 134.0, 47.0, 109.0, 174.0, 169.0, 189.0, 55.0]
2025-08-07 05:30:57,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 13 minutes, 29 seconds)
2025-08-07 05:32:45,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:47,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 284.44916 ± 175.196
2025-08-07 05:32:47,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [63.32936, 11.177027, 242.78008, 542.32324, 192.37611, 506.49976, 411.72806, 208.11555, 456.75024, 209.41248]
2025-08-07 05:32:47,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [67.0, 22.0, 150.0, 244.0, 188.0, 274.0, 203.0, 160.0, 218.0, 140.0]
2025-08-07 05:32:47,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 57 seconds)
2025-08-07 05:34:31,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:34,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 361.20938 ± 156.727
2025-08-07 05:34:34,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [777.9108, 342.676, 233.16917, 307.12787, 288.02454, 383.36057, 349.04156, 293.2981, 456.79382, 180.69144]
2025-08-07 05:34:34,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [466.0, 163.0, 118.0, 187.0, 142.0, 182.0, 170.0, 131.0, 212.0, 97.0]
2025-08-07 05:34:34,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (361.21) for latency ExtremeSparseL4U32
2025-08-07 05:34:34,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 10 minutes, 14 seconds)
2025-08-07 05:36:21,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:24,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 357.40512 ± 78.999
2025-08-07 05:36:24,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [324.36133, 338.48358, 267.47745, 458.3281, 377.25974, 398.89166, 313.67352, 443.44025, 205.04791, 447.08786]
2025-08-07 05:36:24,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 159.0, 135.0, 193.0, 195.0, 200.0, 145.0, 236.0, 108.0, 219.0]
2025-08-07 05:36:24,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 8 minutes, 38 seconds)
2025-08-07 05:38:10,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:38:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 357.81918 ± 196.642
2025-08-07 05:38:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [43.00725, 316.65707, 20.412289, 489.06528, 430.09482, 336.51382, 268.54272, 511.3386, 675.41736, 487.14252]
2025-08-07 05:38:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 144.0, 32.0, 247.0, 223.0, 181.0, 140.0, 293.0, 298.0, 254.0]
2025-08-07 05:38:13,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 minutes, 54 seconds)
2025-08-07 05:39:59,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:40:02,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 387.22821 ± 275.629
2025-08-07 05:40:02,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [316.12845, 401.85828, 652.3541, 312.35587, 61.497677, 200.9206, 456.3059, 195.09583, 1068.2286, 207.53676]
2025-08-07 05:40:02,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 191.0, 240.0, 141.0, 64.0, 102.0, 239.0, 153.0, 479.0, 117.0]
2025-08-07 05:40:02,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (387.23) for latency ExtremeSparseL4U32
2025-08-07 05:40:02,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 5 minutes, 19 seconds)
2025-08-07 05:41:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:41:49,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 333.73819 ± 145.811
2025-08-07 05:41:49,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [395.64487, 379.20926, 356.09027, 205.82819, 267.25705, 277.043, 310.19043, 723.0839, 221.32768, 201.70714]
2025-08-07 05:41:49,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 193.0, 281.0, 132.0, 138.0, 126.0, 198.0, 243.0, 219.0, 99.0]
2025-08-07 05:41:49,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 3 minutes, 12 seconds)
2025-08-07 05:43:35,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:43:38,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 323.90625 ± 156.118
2025-08-07 05:43:38,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [23.091705, 216.1284, 233.70566, 511.79575, 260.844, 226.82521, 499.463, 437.32745, 525.6742, 304.20715]
2025-08-07 05:43:38,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 110.0, 122.0, 202.0, 131.0, 101.0, 279.0, 256.0, 262.0, 141.0]
2025-08-07 05:43:38,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 1 minute, 36 seconds)
2025-08-07 05:45:22,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:45:25,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 303.10468 ± 166.597
2025-08-07 05:45:25,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [320.54544, 573.52716, 446.8994, 283.31085, 423.55768, 17.413305, 258.23987, 350.72928, 17.882614, 338.94092]
2025-08-07 05:45:25,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 253.0, 191.0, 136.0, 192.0, 29.0, 148.0, 141.0, 32.0, 178.0]
2025-08-07 05:45:25,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 59 minutes, 31 seconds)
2025-08-07 05:47:12,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:47:15,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 301.89508 ± 183.239
2025-08-07 05:47:15,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [472.45578, 151.15852, 533.13025, 293.15564, 19.523697, 258.6235, 467.68066, 310.16476, 498.53354, 14.524418]
2025-08-07 05:47:15,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 84.0, 230.0, 145.0, 32.0, 129.0, 217.0, 155.0, 228.0, 26.0]
2025-08-07 05:47:15,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 57 minutes, 47 seconds)
2025-08-07 05:48:59,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:49:01,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 367.27838 ± 171.799
2025-08-07 05:49:01,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [623.38745, 15.566653, 363.5476, 194.6874, 503.04346, 343.1675, 408.93292, 280.84717, 354.01108, 585.5926]
2025-08-07 05:49:01,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [285.0, 29.0, 157.0, 105.0, 249.0, 159.0, 181.0, 136.0, 142.0, 235.0]
2025-08-07 05:49:01,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 55 minutes, 45 seconds)
2025-08-07 05:50:48,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:50:50,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 344.69107 ± 111.808
2025-08-07 05:50:50,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [305.74786, 204.23586, 269.07404, 443.33276, 356.20184, 347.91052, 559.36597, 286.9285, 476.04794, 198.06534]
2025-08-07 05:50:50,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 120.0, 139.0, 192.0, 175.0, 150.0, 230.0, 159.0, 181.0, 164.0]
2025-08-07 05:50:50,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 54 minutes, 7 seconds)
2025-08-07 05:52:37,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:52:40,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 351.60248 ± 166.551
2025-08-07 05:52:40,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [210.64584, 548.917, 401.02545, 429.49677, 205.65579, 379.13977, 366.85358, 625.09845, 333.22345, 15.968639]
2025-08-07 05:52:40,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [108.0, 219.0, 208.0, 183.0, 114.0, 182.0, 177.0, 304.0, 143.0, 28.0]
2025-08-07 05:52:40,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 52 minutes, 21 seconds)
2025-08-07 05:54:25,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:54:27,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 242.67830 ± 177.704
2025-08-07 05:54:27,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [271.47592, 312.83554, 208.57391, 16.79613, 14.819705, 230.47513, 18.405561, 323.94757, 494.03876, 535.41473]
2025-08-07 05:54:27,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 150.0, 121.0, 29.0, 26.0, 109.0, 30.0, 149.0, 229.0, 202.0]
2025-08-07 05:54:27,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 50 minutes, 33 seconds)
2025-08-07 05:56:14,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:56:16,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 256.57831 ± 186.745
2025-08-07 05:56:16,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [359.67883, 565.001, 17.971682, 20.932419, 249.18915, 252.30666, 529.6163, 21.605457, 271.03827, 278.44302]
2025-08-07 05:56:16,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 212.0, 30.0, 31.0, 140.0, 139.0, 226.0, 32.0, 140.0, 152.0]
2025-08-07 05:56:16,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 48 minutes, 44 seconds)
2025-08-07 05:58:01,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:58:03,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 252.93399 ± 124.166
2025-08-07 05:58:03,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [355.25372, 251.45305, 297.38422, 334.1612, 19.956879, 301.24295, 15.870767, 267.44702, 289.53833, 397.03183]
2025-08-07 05:58:03,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 129.0, 162.0, 173.0, 31.0, 146.0, 29.0, 142.0, 158.0, 263.0]
2025-08-07 05:58:03,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 46 minutes, 56 seconds)
2025-08-07 05:59:48,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:59:51,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 340.60068 ± 105.399
2025-08-07 05:59:51,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [428.7149, 192.69334, 366.5776, 309.8501, 199.75551, 423.91037, 366.7376, 510.0386, 201.4958, 406.23294]
2025-08-07 05:59:51,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 177.0, 146.0, 157.0, 102.0, 243.0, 175.0, 278.0, 181.0, 190.0]
2025-08-07 05:59:51,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 3 seconds)
2025-08-07 06:01:37,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:01:40,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 368.59851 ± 208.950
2025-08-07 06:01:40,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [277.34726, 372.66623, 179.92818, 169.48236, 20.638483, 396.9246, 681.2449, 722.40375, 432.61786, 432.73138]
2025-08-07 06:01:40,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 185.0, 140.0, 142.0, 31.0, 159.0, 250.0, 381.0, 169.0, 189.0]
2025-08-07 06:01:40,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 43 minutes, 14 seconds)
2025-08-07 06:03:26,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 211.76807 ± 137.848
2025-08-07 06:03:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [347.8557, 295.23715, 371.6352, 162.63266, 21.747356, 406.12997, 264.32394, 164.12944, 42.109264, 41.87979]
2025-08-07 06:03:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 144.0, 156.0, 106.0, 33.0, 165.0, 150.0, 214.0, 53.0, 49.0]
2025-08-07 06:03:28,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 31 seconds)
2025-08-07 06:05:13,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:15,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 192.40913 ± 222.155
2025-08-07 06:05:15,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [547.5607, 584.3477, 364.64203, 292.8733, 46.20275, 14.683104, 16.979664, 20.55528, 18.899132, 17.347486]
2025-08-07 06:05:15,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [266.0, 240.0, 165.0, 142.0, 52.0, 26.0, 29.0, 30.0, 30.0, 29.0]
2025-08-07 06:05:15,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 30 seconds)
2025-08-07 06:07:01,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:07:04,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 317.10718 ± 87.618
2025-08-07 06:07:04,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [286.6296, 522.69916, 283.1135, 171.06062, 345.71112, 327.65024, 323.00137, 335.4345, 351.44687, 224.32469]
2025-08-07 06:07:04,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 234.0, 160.0, 97.0, 177.0, 145.0, 150.0, 163.0, 177.0, 125.0]
2025-08-07 06:07:04,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 50 seconds)
2025-08-07 06:08:51,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:08:53,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 241.45349 ± 134.200
2025-08-07 06:08:53,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [444.30606, 247.0973, 194.57779, 321.95297, 322.21225, 274.8302, 18.68338, 18.646992, 187.76993, 384.45807]
2025-08-07 06:08:53,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 129.0, 99.0, 162.0, 171.0, 144.0, 30.0, 32.0, 101.0, 171.0]
2025-08-07 06:08:53,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 7 seconds)
2025-08-07 06:10:37,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:10:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 282.11221 ± 144.088
2025-08-07 06:10:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [407.7589, 13.142471, 362.25153, 434.97244, 15.199118, 260.29147, 398.13882, 278.37595, 306.34918, 344.64212]
2025-08-07 06:10:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 25.0, 205.0, 259.0, 27.0, 139.0, 229.0, 138.0, 152.0, 158.0]
2025-08-07 06:10:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 9 seconds)
2025-08-07 06:12:26,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:12:29,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 355.93878 ± 296.858
2025-08-07 06:12:29,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [236.38745, 370.094, 18.951664, 793.4069, 506.40253, 407.19653, 299.7223, 11.657447, 13.872649, 901.6963]
2025-08-07 06:12:29,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 160.0, 32.0, 430.0, 226.0, 184.0, 164.0, 25.0, 26.0, 424.0]
2025-08-07 06:12:29,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 28 seconds)
2025-08-07 06:14:15,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:14:18,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 435.49429 ± 238.412
2025-08-07 06:14:18,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [237.1573, 442.4928, 390.369, 351.87134, 629.6909, 314.22116, 300.91504, 1075.0676, 366.42456, 246.73297]
2025-08-07 06:14:18,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [106.0, 256.0, 182.0, 204.0, 290.0, 139.0, 156.0, 487.0, 164.0, 129.0]
2025-08-07 06:14:18,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (435.49) for latency ExtremeSparseL4U32
2025-08-07 06:14:18,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 47 seconds)
2025-08-07 06:16:05,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:16:08,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 337.47232 ± 220.979
2025-08-07 06:16:08,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [648.3665, 648.53827, 371.36066, 532.8858, 174.32631, 18.868317, 200.92653, 409.19543, 15.961916, 354.29312]
2025-08-07 06:16:08,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [324.0, 278.0, 184.0, 233.0, 131.0, 28.0, 111.0, 163.0, 26.0, 182.0]
2025-08-07 06:16:08,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 1 second)
2025-08-07 06:17:52,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:17:55,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 339.87552 ± 93.170
2025-08-07 06:17:55,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [402.58987, 320.80884, 363.88992, 277.84845, 563.9467, 256.49704, 259.10248, 361.0137, 225.12823, 367.92987]
2025-08-07 06:17:55,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 137.0, 158.0, 132.0, 252.0, 143.0, 132.0, 169.0, 121.0, 182.0]
2025-08-07 06:17:55,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 5 seconds)
2025-08-07 06:19:41,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:19:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 317.24097 ± 237.354
2025-08-07 06:19:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [20.932968, 379.9865, 20.92315, 307.88962, 292.4553, 13.069261, 286.75955, 731.38, 579.2833, 539.72986]
2025-08-07 06:19:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 145.0, 32.0, 153.0, 151.0, 24.0, 143.0, 290.0, 262.0, 254.0]
2025-08-07 06:19:43,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 22 seconds)
2025-08-07 06:21:30,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:21:32,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 301.41611 ± 159.681
2025-08-07 06:21:32,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [368.86166, 356.90872, 460.81943, 368.66, 349.99387, 13.4302635, 484.46957, 390.29312, 16.250395, 204.47406]
2025-08-07 06:21:32,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 196.0, 230.0, 157.0, 164.0, 25.0, 220.0, 183.0, 27.0, 117.0]
2025-08-07 06:21:32,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 31 seconds)
2025-08-07 06:23:19,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:23:22,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 293.89255 ± 99.754
2025-08-07 06:23:22,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [349.50198, 313.26862, 397.94534, 300.7275, 343.89005, 255.16055, 313.27753, 240.56165, 392.06177, 32.53044]
2025-08-07 06:23:22,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 147.0, 177.0, 125.0, 180.0, 148.0, 181.0, 129.0, 177.0, 52.0]
2025-08-07 06:23:22,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 44 seconds)
2025-08-07 06:25:08,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:25:11,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 340.26865 ± 175.646
2025-08-07 06:25:11,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [414.89438, 373.743, 261.51016, 542.0983, 448.9542, 521.8362, 388.77667, 23.83091, 407.80548, 19.237383]
2025-08-07 06:25:11,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 149.0, 141.0, 236.0, 228.0, 220.0, 185.0, 32.0, 195.0, 31.0]
2025-08-07 06:25:11,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 54 seconds)
2025-08-07 06:26:56,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:26:59,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 402.89621 ± 92.332
2025-08-07 06:26:59,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [397.25726, 574.60376, 412.69714, 386.4037, 268.1535, 566.0705, 341.97687, 350.0967, 339.93118, 391.77158]
2025-08-07 06:26:59,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 239.0, 204.0, 166.0, 131.0, 248.0, 149.0, 169.0, 170.0, 175.0]
2025-08-07 06:26:59,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 8 seconds)
2025-08-07 06:28:45,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:48,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 427.75284 ± 148.374
2025-08-07 06:28:48,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [252.60007, 284.72714, 306.41058, 354.9247, 734.35406, 451.47424, 567.74774, 408.5558, 330.73807, 585.9957]
2025-08-07 06:28:48,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 151.0, 160.0, 148.0, 266.0, 191.0, 207.0, 206.0, 142.0, 263.0]
2025-08-07 06:28:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 20 seconds)
2025-08-07 06:30:34,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:30:36,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 297.91669 ± 116.596
2025-08-07 06:30:36,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [317.53748, 444.3516, 282.99435, 407.63074, 297.74762, 15.136608, 312.9496, 253.57501, 227.7569, 419.48727]
2025-08-07 06:30:36,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 173.0, 151.0, 198.0, 147.0, 27.0, 164.0, 126.0, 117.0, 155.0]
2025-08-07 06:30:36,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 29 seconds)
2025-08-07 06:32:20,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:32:24,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 439.82617 ± 230.451
2025-08-07 06:32:24,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [534.2908, 332.85434, 303.82016, 573.3874, 408.37024, 391.8656, 19.280186, 670.8127, 265.11597, 898.46423]
2025-08-07 06:32:24,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 155.0, 151.0, 219.0, 241.0, 196.0, 32.0, 302.0, 128.0, 501.0]
2025-08-07 06:32:24,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (439.83) for latency ExtremeSparseL4U32
2025-08-07 06:32:24,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 38 seconds)
2025-08-07 06:34:10,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:34:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 343.12170 ± 97.453
2025-08-07 06:34:13,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [335.90637, 194.06386, 467.01666, 427.03183, 270.4879, 506.50113, 238.82799, 270.51428, 388.6531, 332.2138]
2025-08-07 06:34:13,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 109.0, 196.0, 232.0, 130.0, 222.0, 124.0, 141.0, 167.0, 136.0]
2025-08-07 06:34:13,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 50 seconds)
2025-08-07 06:35:59,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:36:03,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 413.51450 ± 215.631
2025-08-07 06:36:03,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [192.09259, 266.52313, 258.4672, 795.80383, 425.74292, 452.3914, 404.64862, 189.02773, 819.88196, 330.56583]
2025-08-07 06:36:03,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [97.0, 138.0, 139.0, 340.0, 277.0, 209.0, 237.0, 206.0, 308.0, 158.0]
2025-08-07 06:36:03,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 3 seconds)
2025-08-07 06:37:50,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:37:53,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 377.31079 ± 242.979
2025-08-07 06:37:53,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [295.2116, 397.13113, 805.1511, 425.24188, 747.5301, 438.13654, 256.49258, 348.32913, 16.173399, 43.7105]
2025-08-07 06:37:53,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 161.0, 325.0, 175.0, 318.0, 238.0, 133.0, 179.0, 29.0, 57.0]
2025-08-07 06:37:53,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 15 seconds)
2025-08-07 06:39:39,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:39:41,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 239.58267 ± 150.799
2025-08-07 06:39:41,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [22.964128, 556.70654, 352.94925, 15.466179, 235.05188, 191.00557, 212.24524, 323.2593, 300.88025, 185.29842]
2025-08-07 06:39:41,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 236.0, 145.0, 28.0, 124.0, 104.0, 118.0, 139.0, 131.0, 104.0]
2025-08-07 06:39:41,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 26 seconds)
2025-08-07 06:41:25,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:41:28,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 405.33670 ± 147.131
2025-08-07 06:41:28,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [282.50287, 248.6995, 404.8815, 259.95718, 293.2019, 526.90454, 742.73926, 352.80194, 430.95175, 510.72662]
2025-08-07 06:41:28,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 125.0, 168.0, 135.0, 145.0, 209.0, 307.0, 207.0, 189.0, 276.0]
2025-08-07 06:41:28,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 37 seconds)
2025-08-07 06:43:15,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:43:18,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 349.38925 ± 119.573
2025-08-07 06:43:18,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [220.5694, 289.1811, 268.0904, 249.14363, 205.80177, 603.33264, 411.15976, 453.03333, 427.95953, 365.62106]
2025-08-07 06:43:18,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 234.0, 131.0, 137.0, 105.0, 230.0, 208.0, 180.0, 184.0, 157.0]
2025-08-07 06:43:18,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 49 seconds)
2025-08-07 06:45:03,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:45:06,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 391.27362 ± 207.082
2025-08-07 06:45:06,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [375.83356, 376.77057, 235.09293, 18.35013, 276.3287, 444.79105, 754.6108, 311.81845, 723.6848, 395.45544]
2025-08-07 06:45:06,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [157.0, 166.0, 126.0, 29.0, 128.0, 172.0, 330.0, 156.0, 272.0, 194.0]
2025-08-07 06:45:06,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1251 [DEBUG]: Training session finished
