2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1546f829bf10>}
2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:19,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1133 [INFO]: Creating new trainer
2025-08-07 00:48:19,615 baseline-bpql-noiseperc0-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 00:48:19,615 baseline-bpql-noiseperc0-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:22,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:22,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:57,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:58,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 56.52653 ± 0.940
2025-08-07 00:49:58,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [55.44725, 54.99624, 56.93143, 56.653126, 57.961044, 57.394382, 57.118, 57.11897, 56.387943, 55.25692]
2025-08-07 00:49:58,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 49.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 49.0]
2025-08-07 00:49:58,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (56.53) for latency ExtremeSparseL4U32
2025-08-07 00:49:58,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 37 minutes, 56 seconds)
2025-08-07 00:51:39,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:40,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 110.21511 ± 29.826
2025-08-07 00:51:40,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [128.41695, 128.80087, 64.71759, 129.22253, 130.86832, 130.00833, 64.51894, 130.37709, 130.45107, 64.76944]
2025-08-07 00:51:40,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 85.0, 57.0, 86.0, 86.0, 86.0, 57.0, 86.0, 86.0, 57.0]
2025-08-07 00:51:40,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (110.22) for latency ExtremeSparseL4U32
2025-08-07 00:51:40,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 41 minutes, 58 seconds)
2025-08-07 00:53:22,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:23,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 76.77481 ± 1.651
2025-08-07 00:53:23,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [73.667046, 76.43049, 75.56003, 79.20936, 78.2109, 75.80499, 78.01355, 76.22995, 78.83938, 75.78239]
2025-08-07 00:53:23,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 60.0, 60.0, 61.0, 61.0, 60.0, 61.0, 60.0, 61.0, 60.0]
2025-08-07 00:53:23,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 42 minutes, 4 seconds)
2025-08-07 00:55:04,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 115.83633 ± 7.142
2025-08-07 00:55:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [113.11643, 112.279144, 118.641556, 107.37918, 115.71822, 121.55351, 133.12645, 107.09982, 115.034485, 114.41457]
2025-08-07 00:55:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 92.0, 95.0, 89.0, 94.0, 100.0, 103.0, 90.0, 95.0, 94.0]
2025-08-07 00:55:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (115.84) for latency ExtremeSparseL4U32
2025-08-07 00:55:05,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 41 minutes, 19 seconds)
2025-08-07 00:56:46,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 213.26924 ± 11.409
2025-08-07 00:56:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [211.42055, 200.12169, 224.67757, 202.24164, 234.3038, 211.69305, 229.27783, 210.41109, 203.17093, 205.37418]
2025-08-07 00:56:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 114.0, 122.0, 115.0, 125.0, 118.0, 122.0, 115.0, 114.0, 116.0]
2025-08-07 00:56:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (213.27) for latency ExtremeSparseL4U32
2025-08-07 00:56:48,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 40 minutes, 9 seconds)
2025-08-07 00:58:29,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:31,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 227.24585 ± 39.597
2025-08-07 00:58:31,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [274.25214, 209.20021, 196.60503, 233.16103, 269.81134, 228.55301, 187.86008, 295.28717, 164.07059, 213.65804]
2025-08-07 00:58:31,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 120.0, 117.0, 130.0, 141.0, 127.0, 112.0, 151.0, 102.0, 120.0]
2025-08-07 00:58:31,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (227.25) for latency ExtremeSparseL4U32
2025-08-07 00:58:31,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 40 minutes, 54 seconds)
2025-08-07 01:00:12,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:14,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 192.65044 ± 29.780
2025-08-07 01:00:14,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [184.37589, 194.46814, 240.23532, 223.0772, 155.4849, 130.49974, 201.05908, 203.4188, 205.27641, 188.60901]
2025-08-07 01:00:14,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 117.0, 130.0, 124.0, 103.0, 93.0, 119.0, 120.0, 121.0, 114.0]
2025-08-07 01:00:14,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 39 minutes, 18 seconds)
2025-08-07 01:01:56,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:58,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 293.93295 ± 32.526
2025-08-07 01:01:58,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [211.01889, 299.72687, 301.97946, 289.85776, 281.12952, 316.95386, 331.7559, 279.45807, 298.36014, 329.08902]
2025-08-07 01:01:58,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 143.0, 144.0, 145.0, 140.0, 150.0, 151.0, 139.0, 144.0, 151.0]
2025-08-07 01:01:58,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (293.93) for latency ExtremeSparseL4U32
2025-08-07 01:01:58,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 38 minutes, 2 seconds)
2025-08-07 01:03:39,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:41,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 274.54816 ± 93.690
2025-08-07 01:03:41,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [142.54182, 262.70358, 226.91136, 166.23555, 364.1745, 352.5186, 386.9338, 364.6694, 336.54562, 142.24713]
2025-08-07 01:03:41,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 132.0, 123.0, 102.0, 159.0, 156.0, 168.0, 157.0, 151.0, 94.0]
2025-08-07 01:03:41,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-08-07 01:05:23,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:25,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 241.28027 ± 53.546
2025-08-07 01:05:25,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [344.04517, 193.23584, 261.42273, 203.7962, 220.33005, 263.36133, 204.48045, 181.37952, 327.0774, 213.67404]
2025-08-07 01:05:25,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 120.0, 149.0, 126.0, 129.0, 150.0, 128.0, 120.0, 152.0, 136.0]
2025-08-07 01:05:25,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 35 minutes, 6 seconds)
2025-08-07 01:07:06,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:07:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 245.66548 ± 33.710
2025-08-07 01:07:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [255.28456, 208.62346, 233.62082, 275.29932, 200.31023, 200.77205, 299.24167, 250.25674, 243.89046, 289.35547]
2025-08-07 01:07:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 126.0, 140.0, 155.0, 121.0, 121.0, 162.0, 147.0, 139.0, 159.0]
2025-08-07 01:07:08,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 33 minutes, 21 seconds)
2025-08-07 01:08:50,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 323.99078 ± 97.278
2025-08-07 01:08:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [210.63792, 259.143, 340.66507, 327.097, 286.28955, 224.16103, 366.0439, 297.5267, 574.0135, 354.33026]
2025-08-07 01:08:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 153.0, 186.0, 183.0, 166.0, 139.0, 189.0, 151.0, 234.0, 179.0]
2025-08-07 01:08:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (323.99) for latency ExtremeSparseL4U32
2025-08-07 01:08:53,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 32 minutes, 5 seconds)
2025-08-07 01:10:34,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:37,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 276.32355 ± 18.474
2025-08-07 01:10:37,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [290.69147, 231.86346, 269.27405, 277.25436, 294.00708, 262.88327, 284.74622, 295.44662, 289.4771, 267.59204]
2025-08-07 01:10:37,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 129.0, 157.0, 157.0, 162.0, 153.0, 157.0, 160.0, 164.0, 153.0]
2025-08-07 01:10:37,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 30 minutes, 23 seconds)
2025-08-07 01:12:19,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:22,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 392.76678 ± 75.664
2025-08-07 01:12:22,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [308.5923, 440.21222, 424.55518, 317.29065, 315.38675, 380.05402, 563.98584, 331.892, 433.46002, 412.23898]
2025-08-07 01:12:22,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 191.0, 186.0, 165.0, 152.0, 182.0, 240.0, 165.0, 183.0, 182.0]
2025-08-07 01:12:22,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (392.77) for latency ExtremeSparseL4U32
2025-08-07 01:12:22,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 29 minutes, 23 seconds)
2025-08-07 01:14:03,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:05,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 320.56332 ± 116.348
2025-08-07 01:14:05,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [432.49823, 347.86462, 384.49454, 131.8905, 139.41971, 294.53925, 438.75415, 365.48114, 207.87964, 462.81186]
2025-08-07 01:14:05,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [189.0, 161.0, 177.0, 86.0, 90.0, 148.0, 192.0, 173.0, 115.0, 193.0]
2025-08-07 01:14:05,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 27 minutes, 29 seconds)
2025-08-07 01:15:47,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:49,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 209.28967 ± 76.754
2025-08-07 01:15:49,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [124.13895, 322.1057, 124.52865, 352.20282, 146.6281, 226.41364, 265.44742, 200.97656, 165.43549, 165.01936]
2025-08-07 01:15:49,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 171.0, 84.0, 190.0, 94.0, 131.0, 139.0, 115.0, 102.0, 103.0]
2025-08-07 01:15:49,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 25 minutes, 44 seconds)
2025-08-07 01:17:31,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 308.36456 ± 89.641
2025-08-07 01:17:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [408.69138, 391.7919, 399.87836, 205.5595, 221.61192, 205.54494, 276.55313, 391.11252, 196.96127, 385.94058]
2025-08-07 01:17:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 175.0, 176.0, 117.0, 121.0, 117.0, 139.0, 170.0, 113.0, 172.0]
2025-08-07 01:17:34,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 24 minutes, 9 seconds)
2025-08-07 01:19:14,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:17,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 299.24771 ± 131.233
2025-08-07 01:19:17,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [294.75128, 184.87369, 208.78554, 393.6903, 187.97122, 386.89728, 197.06808, 611.7438, 338.4487, 188.2471]
2025-08-07 01:19:17,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 106.0, 117.0, 180.0, 108.0, 174.0, 111.0, 230.0, 158.0, 107.0]
2025-08-07 01:19:17,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 22 minutes, 6 seconds)
2025-08-07 01:20:59,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:00,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 117.10073 ± 8.557
2025-08-07 01:21:00,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [121.91701, 117.006935, 105.64815, 126.28621, 125.59937, 108.94882, 121.30832, 117.14868, 126.06479, 101.078964]
2025-08-07 01:21:00,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 79.0, 73.0, 85.0, 84.0, 76.0, 82.0, 80.0, 85.0, 71.0]
2025-08-07 01:21:00,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 19 minutes, 52 seconds)
2025-08-07 01:22:41,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:22:42,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 185.77452 ± 207.595
2025-08-07 01:22:42,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [419.788, 86.91626, 85.139114, 88.244965, 88.51271, 88.55855, 89.521774, 88.57563, 89.15372, 733.3346]
2025-08-07 01:22:42,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 64.0, 63.0, 65.0, 65.0, 65.0, 65.0, 65.0, 65.0, 267.0]
2025-08-07 01:22:42,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 17 minutes, 50 seconds)
2025-08-07 01:24:23,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:24:25,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 206.22842 ± 83.484
2025-08-07 01:24:25,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [161.84488, 170.80624, 172.30145, 409.53845, 172.03778, 160.23479, 158.20029, 328.29117, 167.25998, 161.76892]
2025-08-07 01:24:25,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 100.0, 101.0, 179.0, 100.0, 97.0, 96.0, 152.0, 99.0, 97.0]
2025-08-07 01:24:25,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 58 seconds)
2025-08-07 01:26:05,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:09,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 558.91980 ± 203.330
2025-08-07 01:26:09,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [850.8286, 437.74527, 237.83922, 404.1144, 906.712, 484.05734, 465.69217, 605.668, 749.83, 446.71082]
2025-08-07 01:26:09,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [293.0, 192.0, 128.0, 181.0, 297.0, 216.0, 200.0, 239.0, 270.0, 192.0]
2025-08-07 01:26:09,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (558.92) for latency ExtremeSparseL4U32
2025-08-07 01:26:09,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 54 seconds)
2025-08-07 01:27:49,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:52,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 346.79358 ± 65.306
2025-08-07 01:27:52,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [285.1418, 467.4098, 323.14337, 281.98578, 284.26114, 425.24625, 320.27405, 434.24966, 323.92694, 322.29697]
2025-08-07 01:27:52,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 206.0, 168.0, 150.0, 152.0, 207.0, 168.0, 206.0, 168.0, 167.0]
2025-08-07 01:27:52,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 12 minutes, 15 seconds)
2025-08-07 01:29:31,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:34,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 433.25763 ± 211.767
2025-08-07 01:29:34,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [317.12457, 932.6657, 287.5466, 426.4785, 405.5472, 278.75055, 735.733, 266.6414, 299.92957, 382.15927]
2025-08-07 01:29:34,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 315.0, 150.0, 208.0, 188.0, 154.0, 266.0, 145.0, 161.0, 183.0]
2025-08-07 01:29:34,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 10 minutes, 15 seconds)
2025-08-07 01:31:15,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:18,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 437.92050 ± 133.395
2025-08-07 01:31:18,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [463.4929, 157.73506, 432.18802, 470.38696, 554.23755, 438.8671, 405.28497, 697.8566, 445.6352, 313.52063]
2025-08-07 01:31:18,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 97.0, 207.0, 204.0, 253.0, 202.0, 186.0, 297.0, 209.0, 154.0]
2025-08-07 01:31:18,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 8 minutes, 54 seconds)
2025-08-07 01:32:58,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:00,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 298.12262 ± 171.300
2025-08-07 01:33:00,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [225.4271, 212.91586, 745.84595, 216.02814, 214.08638, 329.15198, 460.43396, 201.81282, 134.40974, 241.11436]
2025-08-07 01:33:00,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 122.0, 306.0, 122.0, 122.0, 167.0, 211.0, 119.0, 88.0, 131.0]
2025-08-07 01:33:00,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 7 minutes, 3 seconds)
2025-08-07 01:34:42,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:44,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 333.05743 ± 44.861
2025-08-07 01:34:44,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [331.78622, 312.47745, 335.25183, 461.39175, 311.07812, 308.83792, 295.97183, 322.26, 309.06522, 342.4539]
2025-08-07 01:34:44,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 164.0, 170.0, 210.0, 163.0, 162.0, 160.0, 167.0, 163.0, 173.0]
2025-08-07 01:34:44,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 5 minutes, 28 seconds)
2025-08-07 01:36:24,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:27,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 395.43414 ± 108.960
2025-08-07 01:36:27,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [453.32843, 251.94656, 397.19275, 442.57007, 563.1652, 434.03107, 214.4679, 436.86813, 262.03256, 498.7384]
2025-08-07 01:36:27,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 128.0, 173.0, 184.0, 225.0, 193.0, 117.0, 183.0, 132.0, 204.0]
2025-08-07 01:36:27,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 36 seconds)
2025-08-07 01:38:06,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 292.18658 ± 18.511
2025-08-07 01:38:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [307.54422, 312.47125, 300.36072, 271.7111, 307.46045, 302.94946, 300.5311, 251.61423, 277.23273, 289.99063]
2025-08-07 01:38:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 164.0, 159.0, 148.0, 161.0, 161.0, 159.0, 139.0, 150.0, 156.0]
2025-08-07 01:38:09,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 46 seconds)
2025-08-07 01:39:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:39:52,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 477.34854 ± 150.613
2025-08-07 01:39:52,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [154.17035, 418.5048, 415.7119, 452.77853, 761.6358, 399.51315, 574.1025, 560.0568, 455.00516, 582.0066]
2025-08-07 01:39:52,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 184.0, 183.0, 192.0, 271.0, 179.0, 225.0, 221.0, 192.0, 228.0]
2025-08-07 01:39:52,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 59 minutes, 52 seconds)
2025-08-07 01:41:32,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 566.52618 ± 168.939
2025-08-07 01:41:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [547.13336, 445.69415, 418.95743, 460.2865, 652.4608, 653.3096, 378.93008, 972.08673, 674.9007, 461.5025]
2025-08-07 01:41:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [233.0, 193.0, 189.0, 193.0, 267.0, 268.0, 177.0, 324.0, 263.0, 195.0]
2025-08-07 01:41:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (566.53) for latency ExtremeSparseL4U32
2025-08-07 01:41:35,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 32 seconds)
2025-08-07 01:43:13,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:16,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 418.32770 ± 273.105
2025-08-07 01:43:16,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1016.94806, 890.52014, 303.8362, 295.54803, 332.0721, 294.7072, 298.16364, 313.7129, 284.8766, 152.89207]
2025-08-07 01:43:16,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [339.0, 334.0, 146.0, 144.0, 154.0, 144.0, 145.0, 148.0, 143.0, 93.0]
2025-08-07 01:43:16,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 55 minutes, 59 seconds)
2025-08-07 01:44:57,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:59,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 286.00739 ± 163.466
2025-08-07 01:44:59,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [191.18172, 415.57413, 193.6999, 189.81822, 685.98566, 187.88072, 180.69432, 439.72818, 190.9791, 184.53189]
2025-08-07 01:44:59,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 181.0, 108.0, 107.0, 273.0, 106.0, 104.0, 187.0, 107.0, 105.0]
2025-08-07 01:44:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 54 minutes, 18 seconds)
2025-08-07 01:46:38,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:42,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 594.44446 ± 276.917
2025-08-07 01:46:42,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [413.9875, 306.73578, 449.8371, 983.2356, 1151.4082, 574.3303, 478.58423, 304.46545, 823.9204, 457.94006]
2025-08-07 01:46:42,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 155.0, 196.0, 336.0, 387.0, 235.0, 198.0, 151.0, 386.0, 199.0]
2025-08-07 01:46:42,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (594.44) for latency ExtremeSparseL4U32
2025-08-07 01:46:42,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 52 minutes, 54 seconds)
2025-08-07 01:48:21,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:24,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 408.98508 ± 208.390
2025-08-07 01:48:24,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [485.6803, 322.8196, 453.84964, 764.2295, 213.6147, 188.3014, 455.71158, 767.748, 181.84784, 256.04834]
2025-08-07 01:48:24,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 156.0, 186.0, 260.0, 117.0, 105.0, 186.0, 258.0, 103.0, 129.0]
2025-08-07 01:48:24,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 59 seconds)
2025-08-07 01:50:07,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:11,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 696.75031 ± 157.345
2025-08-07 01:50:11,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [722.20416, 729.0467, 712.2716, 710.1204, 255.70802, 667.4404, 742.46313, 730.7217, 831.051, 866.4758]
2025-08-07 01:50:11,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [245.0, 246.0, 263.0, 246.0, 127.0, 228.0, 252.0, 247.0, 273.0, 282.0]
2025-08-07 01:50:11,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (696.75) for latency ExtremeSparseL4U32
2025-08-07 01:50:11,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 49 minutes, 57 seconds)
2025-08-07 01:51:47,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 580.89343 ± 229.763
2025-08-07 01:51:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [749.3949, 400.41486, 771.9036, 790.818, 819.1225, 858.48645, 400.93338, 165.96756, 423.06244, 428.83093]
2025-08-07 01:51:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 174.0, 265.0, 281.0, 279.0, 286.0, 178.0, 98.0, 181.0, 184.0]
2025-08-07 01:51:51,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 48 minutes, 4 seconds)
2025-08-07 01:53:32,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:36,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 705.20300 ± 129.398
2025-08-07 01:53:36,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [665.83276, 824.4838, 735.38904, 646.4538, 805.01184, 665.12085, 952.44556, 653.29224, 440.2767, 663.72327]
2025-08-07 01:53:36,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [266.0, 337.0, 304.0, 257.0, 272.0, 258.0, 341.0, 262.0, 189.0, 269.0]
2025-08-07 01:53:36,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (705.20) for latency ExtremeSparseL4U32
2025-08-07 01:53:36,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 46 minutes, 59 seconds)
2025-08-07 01:55:15,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 743.02826 ± 367.744
2025-08-07 01:55:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1591.214, 653.95624, 596.26227, 469.9788, 1273.237, 603.0372, 476.14526, 803.28436, 390.41956, 572.74805]
2025-08-07 01:55:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [509.0, 267.0, 262.0, 210.0, 498.0, 259.0, 212.0, 337.0, 178.0, 244.0]
2025-08-07 01:55:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (743.03) for latency ExtremeSparseL4U32
2025-08-07 01:55:20,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 45 minutes, 15 seconds)
2025-08-07 01:57:01,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:05,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 684.30750 ± 112.256
2025-08-07 01:57:05,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [650.526, 750.85614, 668.87494, 730.80115, 931.0297, 700.4721, 524.3684, 636.36053, 520.0784, 729.70715]
2025-08-07 01:57:05,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [224.0, 253.0, 229.0, 249.0, 304.0, 239.0, 196.0, 221.0, 195.0, 248.0]
2025-08-07 01:57:05,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 44 minutes, 12 seconds)
2025-08-07 01:58:44,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:49,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 871.68408 ± 287.122
2025-08-07 01:58:49,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1148.017, 1004.4326, 927.02026, 972.3613, 675.6089, 377.72806, 430.21152, 762.8125, 1206.0098, 1212.6388]
2025-08-07 01:58:49,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [446.0, 337.0, 372.0, 327.0, 270.0, 170.0, 182.0, 307.0, 430.0, 450.0]
2025-08-07 01:58:49,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (871.68) for latency ExtremeSparseL4U32
2025-08-07 01:58:49,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 41 minutes, 56 seconds)
2025-08-07 02:00:29,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:31,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 217.60759 ± 5.516
2025-08-07 02:00:31,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [220.83197, 218.52269, 213.27827, 220.72087, 220.21225, 208.04915, 227.9998, 210.12088, 217.23753, 219.1026]
2025-08-07 02:00:31,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 117.0, 115.0, 117.0, 117.0, 113.0, 119.0, 114.0, 116.0, 116.0]
2025-08-07 02:00:31,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 31 seconds)
2025-08-07 02:02:10,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:15,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 860.35712 ± 251.296
2025-08-07 02:02:15,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [918.7523, 795.29065, 1335.205, 919.937, 890.85236, 1031.3494, 909.1983, 284.00922, 689.6993, 829.27783]
2025-08-07 02:02:15,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [358.0, 303.0, 485.0, 362.0, 338.0, 335.0, 338.0, 137.0, 272.0, 299.0]
2025-08-07 02:02:15,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 35 seconds)
2025-08-07 02:03:56,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1174.27820 ± 563.555
2025-08-07 02:04:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [811.59174, 1503.737, 671.5812, 989.11975, 806.4994, 2678.7644, 1320.2924, 1271.5017, 810.89246, 878.8014]
2025-08-07 02:04:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [292.0, 505.0, 273.0, 371.0, 304.0, 894.0, 482.0, 423.0, 305.0, 336.0]
2025-08-07 02:04:02,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1174.28) for latency ExtremeSparseL4U32
2025-08-07 02:04:02,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 37 minutes, 34 seconds)
2025-08-07 02:05:43,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:46,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 584.66595 ± 230.852
2025-08-07 02:05:46,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [814.6667, 494.61356, 456.2183, 452.11276, 418.72775, 941.8245, 406.2154, 394.2015, 1025.5881, 442.4906]
2025-08-07 02:05:46,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [282.0, 202.0, 193.0, 191.0, 181.0, 310.0, 179.0, 177.0, 333.0, 188.0]
2025-08-07 02:05:46,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 32 seconds)
2025-08-07 02:07:25,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:27,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 369.76852 ± 138.493
2025-08-07 02:07:27,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [289.01633, 706.151, 266.0081, 269.9265, 277.6536, 291.50034, 410.32867, 492.12854, 445.39648, 249.57587]
2025-08-07 02:07:27,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 285.0, 142.0, 144.0, 143.0, 150.0, 188.0, 202.0, 194.0, 135.0]
2025-08-07 02:07:27,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 16 seconds)
2025-08-07 02:09:08,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:12,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 765.80212 ± 521.541
2025-08-07 02:09:12,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [169.53078, 983.9329, 871.8648, 943.4135, 432.75824, 436.79758, 874.7624, 2108.0383, 428.81406, 408.10837]
2025-08-07 02:09:12,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 335.0, 310.0, 330.0, 190.0, 186.0, 294.0, 766.0, 187.0, 182.0]
2025-08-07 02:09:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 32 minutes, 7 seconds)
2025-08-07 02:10:52,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:55,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 475.22461 ± 395.531
2025-08-07 02:10:55,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [386.63757, 397.31036, 254.17218, 397.377, 1650.1367, 411.93344, 339.1818, 357.12357, 309.7069, 248.66672]
2025-08-07 02:10:55,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 179.0, 132.0, 179.0, 626.0, 187.0, 170.0, 175.0, 150.0, 130.0]
2025-08-07 02:10:55,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 9 seconds)
2025-08-07 02:12:36,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:39,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 388.86746 ± 317.209
2025-08-07 02:12:39,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1224.7958, 222.87376, 230.22078, 240.21664, 741.8303, 241.06982, 224.92317, 311.60117, 223.10704, 228.0362]
2025-08-07 02:12:39,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [446.0, 119.0, 120.0, 123.0, 285.0, 123.0, 118.0, 146.0, 119.0, 120.0]
2025-08-07 02:12:39,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 27 minutes, 46 seconds)
2025-08-07 02:14:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:14:22,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 603.28015 ± 395.852
2025-08-07 02:14:22,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1462.6399, 302.8393, 401.48895, 732.21106, 143.41225, 526.59283, 161.99797, 1013.58655, 864.3836, 423.64914]
2025-08-07 02:14:22,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [508.0, 145.0, 178.0, 255.0, 91.0, 207.0, 98.0, 339.0, 342.0, 180.0]
2025-08-07 02:14:22,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 26 minutes)
2025-08-07 02:16:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1072.24390 ± 464.606
2025-08-07 02:16:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [661.80194, 1710.4779, 695.23413, 751.81525, 1517.7559, 1590.9905, 1631.1316, 705.1357, 1035.4004, 422.6963]
2025-08-07 02:16:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [245.0, 611.0, 268.0, 274.0, 535.0, 597.0, 600.0, 275.0, 356.0, 185.0]
2025-08-07 02:16:09,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 8 seconds)
2025-08-07 02:17:47,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:50,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 404.67764 ± 73.085
2025-08-07 02:17:50,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [266.88113, 402.52893, 396.12103, 384.76157, 393.03012, 586.67255, 425.55365, 389.6723, 401.80353, 399.75162]
2025-08-07 02:17:50,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 180.0, 178.0, 174.0, 178.0, 226.0, 186.0, 176.0, 178.0, 179.0]
2025-08-07 02:17:50,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 51 seconds)
2025-08-07 02:19:31,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:38,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1184.25024 ± 461.458
2025-08-07 02:19:38,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1097.1481, 2106.192, 995.7744, 717.399, 950.5833, 427.80667, 1301.0436, 1459.6819, 1720.9839, 1065.8899]
2025-08-07 02:19:38,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [376.0, 700.0, 367.0, 272.0, 359.0, 184.0, 454.0, 525.0, 610.0, 387.0]
2025-08-07 02:19:38,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1184.25) for latency ExtremeSparseL4U32
2025-08-07 02:19:38,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 48 seconds)
2025-08-07 02:21:17,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:23,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1155.96252 ± 365.779
2025-08-07 02:21:23,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1457.3522, 1459.664, 457.70035, 1349.8663, 992.46313, 1070.6139, 1539.985, 726.8605, 1602.8632, 902.2562]
2025-08-07 02:21:23,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [505.0, 466.0, 192.0, 449.0, 354.0, 356.0, 517.0, 271.0, 513.0, 305.0]
2025-08-07 02:21:23,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 20 minutes, 26 seconds)
2025-08-07 02:23:07,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1764.70056 ± 859.660
2025-08-07 02:23:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [719.2601, 992.05817, 1103.6482, 2452.9285, 2843.1301, 1212.1095, 2063.2598, 2619.015, 703.6751, 2937.923]
2025-08-07 02:23:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [284.0, 367.0, 428.0, 853.0, 1000.0, 430.0, 699.0, 877.0, 275.0, 1000.0]
2025-08-07 02:23:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1764.70) for latency ExtremeSparseL4U32
2025-08-07 02:23:17,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 20 minutes, 14 seconds)
2025-08-07 02:24:55,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:02,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1320.26929 ± 669.652
2025-08-07 02:25:02,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1271.7498, 720.729, 1511.4386, 1874.2656, 703.3248, 1150.332, 3047.431, 1081.4152, 993.8566, 848.1507]
2025-08-07 02:25:02,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [421.0, 271.0, 517.0, 616.0, 262.0, 417.0, 1000.0, 407.0, 335.0, 312.0]
2025-08-07 02:25:02,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 18 minutes, 11 seconds)
2025-08-07 02:26:41,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:49,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1436.34351 ± 619.853
2025-08-07 02:26:49,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1506.6627, 706.40564, 1437.6171, 683.1268, 1196.0874, 1924.1603, 704.311, 2752.579, 1776.0543, 1676.43]
2025-08-07 02:26:49,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [495.0, 267.0, 484.0, 259.0, 426.0, 663.0, 264.0, 880.0, 579.0, 581.0]
2025-08-07 02:26:49,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 17 minutes, 16 seconds)
2025-08-07 02:28:30,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:36,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1006.88025 ± 384.555
2025-08-07 02:28:36,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1269.8862, 1513.1377, 893.77203, 1730.4581, 697.1228, 1170.1686, 723.24066, 717.8894, 439.7071, 913.41986]
2025-08-07 02:28:36,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [443.0, 536.0, 329.0, 618.0, 273.0, 429.0, 278.0, 281.0, 190.0, 345.0]
2025-08-07 02:28:36,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 15 minutes, 19 seconds)
2025-08-07 02:30:16,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:23,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1258.11304 ± 848.394
2025-08-07 02:30:23,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1788.2115, 1431.1434, 2674.5527, 977.97534, 629.59875, 434.49164, 623.54895, 2815.8442, 745.66, 460.10455]
2025-08-07 02:30:23,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [598.0, 506.0, 857.0, 357.0, 250.0, 191.0, 251.0, 936.0, 279.0, 195.0]
2025-08-07 02:30:23,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 13 minutes, 44 seconds)
2025-08-07 02:32:05,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1026.85950 ± 340.029
2025-08-07 02:32:11,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1283.4893, 1371.9506, 751.41675, 1096.7736, 590.5174, 879.37836, 841.2081, 720.2849, 1766.034, 967.5422]
2025-08-07 02:32:11,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [471.0, 485.0, 295.0, 430.0, 247.0, 354.0, 332.0, 281.0, 618.0, 340.0]
2025-08-07 02:32:11,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 11 seconds)
2025-08-07 02:33:50,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:33:56,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1055.58447 ± 615.131
2025-08-07 02:33:56,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2404.5376, 714.4174, 835.38495, 857.2265, 436.60822, 676.27234, 1943.3792, 933.55597, 1305.1647, 449.298]
2025-08-07 02:33:56,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [796.0, 268.0, 289.0, 302.0, 185.0, 260.0, 622.0, 344.0, 424.0, 187.0]
2025-08-07 02:33:56,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 9 minutes, 26 seconds)
2025-08-07 02:35:37,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:42,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 885.65320 ± 481.371
2025-08-07 02:35:42,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [695.6062, 458.011, 1007.4997, 1489.595, 1960.7181, 1008.5181, 428.55096, 463.03918, 486.32874, 858.6653]
2025-08-07 02:35:42,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [269.0, 196.0, 339.0, 516.0, 671.0, 336.0, 188.0, 198.0, 199.0, 295.0]
2025-08-07 02:35:42,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 30 seconds)
2025-08-07 02:37:26,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:34,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1342.85156 ± 519.564
2025-08-07 02:37:34,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [962.21265, 611.5975, 988.4426, 2316.5205, 2078.9768, 1448.0494, 1257.7992, 1743.3655, 960.4015, 1061.1492]
2025-08-07 02:37:34,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [363.0, 252.0, 377.0, 795.0, 721.0, 536.0, 448.0, 620.0, 325.0, 354.0]
2025-08-07 02:37:34,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2025-08-07 02:39:13,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:20,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1304.04370 ± 849.873
2025-08-07 02:39:20,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [765.98065, 727.4954, 1482.6938, 817.17413, 166.93199, 1204.8662, 1411.1088, 780.3329, 2760.7004, 2923.152]
2025-08-07 02:39:20,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [285.0, 282.0, 538.0, 314.0, 99.0, 453.0, 510.0, 289.0, 957.0, 1000.0]
2025-08-07 02:39:20,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 4 minutes, 27 seconds)
2025-08-07 02:40:58,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 831.49091 ± 821.134
2025-08-07 02:41:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [433.48343, 457.3684, 3019.282, 437.25522, 1713.097, 463.13214, 436.84476, 466.26355, 453.16516, 435.0181]
2025-08-07 02:41:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 190.0, 976.0, 182.0, 569.0, 191.0, 182.0, 193.0, 188.0, 184.0]
2025-08-07 02:41:03,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 2 minutes, 5 seconds)
2025-08-07 02:42:48,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:56,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1534.68054 ± 799.990
2025-08-07 02:42:56,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1042.8982, 942.5453, 3038.1199, 1464.4377, 1015.35425, 951.6715, 1020.41473, 1868.7832, 3031.0867, 971.49457]
2025-08-07 02:42:56,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [360.0, 354.0, 1000.0, 485.0, 350.0, 357.0, 345.0, 597.0, 1000.0, 358.0]
2025-08-07 02:42:56,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 1 minute, 14 seconds)
2025-08-07 02:44:37,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:45,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1377.94312 ± 664.603
2025-08-07 02:44:45,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [702.2355, 3040.6067, 997.2793, 972.0789, 1794.06, 1825.321, 942.09314, 986.6542, 1525.913, 993.1893]
2025-08-07 02:44:45,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [271.0, 1000.0, 364.0, 358.0, 608.0, 618.0, 351.0, 362.0, 546.0, 361.0]
2025-08-07 02:44:45,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 59 minutes, 40 seconds)
2025-08-07 02:46:24,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1063.80701 ± 423.436
2025-08-07 02:46:30,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1016.809, 152.46263, 938.6749, 1264.9669, 955.08215, 758.1301, 1771.6125, 1553.2013, 950.1742, 1276.9575]
2025-08-07 02:46:30,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [355.0, 92.0, 339.0, 430.0, 339.0, 276.0, 553.0, 498.0, 343.0, 440.0]
2025-08-07 02:46:30,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 57 minutes, 12 seconds)
2025-08-07 02:48:10,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:17,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1427.49463 ± 399.078
2025-08-07 02:48:17,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1290.9514, 2332.183, 1257.4845, 1866.4426, 1229.2432, 1307.2208, 1471.6031, 952.06354, 959.1111, 1608.6423]
2025-08-07 02:48:17,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [424.0, 736.0, 446.0, 591.0, 397.0, 423.0, 512.0, 342.0, 345.0, 527.0]
2025-08-07 02:48:17,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 55 minutes, 29 seconds)
2025-08-07 02:49:59,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:09,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1936.41345 ± 782.945
2025-08-07 02:50:09,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2227.8027, 2959.785, 1834.861, 1583.5537, 1827.9377, 267.94427, 1814.4485, 2957.2278, 2656.046, 1234.5286]
2025-08-07 02:50:09,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [731.0, 1000.0, 616.0, 521.0, 606.0, 136.0, 622.0, 962.0, 886.0, 451.0]
2025-08-07 02:50:09,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1936.41) for latency ExtremeSparseL4U32
2025-08-07 02:50:09,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 54 minutes, 36 seconds)
2025-08-07 02:51:48,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:53,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 989.16425 ± 167.492
2025-08-07 02:51:53,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1460.7639, 915.38226, 1068.9943, 928.6135, 1004.56305, 852.97095, 919.4191, 941.6488, 880.7886, 918.49817]
2025-08-07 02:51:53,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [464.0, 304.0, 355.0, 340.0, 344.0, 292.0, 313.0, 315.0, 299.0, 308.0]
2025-08-07 02:51:53,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 54 seconds)
2025-08-07 02:53:40,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:46,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1016.85608 ± 298.651
2025-08-07 02:53:46,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1240.8617, 1046.0563, 976.7379, 732.12305, 820.2214, 717.8143, 1706.7914, 972.8097, 1250.5953, 704.5492]
2025-08-07 02:53:46,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [463.0, 372.0, 370.0, 285.0, 320.0, 279.0, 619.0, 366.0, 452.0, 275.0]
2025-08-07 02:53:46,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 50 minutes, 29 seconds)
2025-08-07 02:55:27,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:40,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2516.34058 ± 486.216
2025-08-07 02:55:40,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2748.1318, 3132.8828, 1939.0837, 3068.2358, 2383.2715, 1988.4219, 2542.254, 2554.2585, 3099.274, 1707.5945]
2025-08-07 02:55:40,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [888.0, 1000.0, 629.0, 1000.0, 787.0, 611.0, 848.0, 846.0, 1000.0, 558.0]
2025-08-07 02:55:40,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2516.34) for latency ExtremeSparseL4U32
2025-08-07 02:55:40,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 49 minutes, 27 seconds)
2025-08-07 02:57:14,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:23,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1618.99512 ± 636.004
2025-08-07 02:57:23,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [964.83386, 1406.8376, 1422.2194, 1873.6285, 965.48737, 973.3475, 1328.493, 2608.9927, 1789.8126, 2856.2986]
2025-08-07 02:57:23,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [349.0, 455.0, 458.0, 621.0, 349.0, 360.0, 463.0, 822.0, 571.0, 929.0]
2025-08-07 02:57:23,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 17 seconds)
2025-08-07 02:59:07,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:15,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1332.66577 ± 724.386
2025-08-07 02:59:15,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [748.73486, 2346.0466, 741.83344, 751.49304, 2428.3877, 1780.4805, 729.372, 2236.49, 826.1969, 737.6227]
2025-08-07 02:59:15,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [282.0, 787.0, 280.0, 285.0, 800.0, 598.0, 278.0, 713.0, 304.0, 280.0]
2025-08-07 02:59:15,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 27 seconds)
2025-08-07 03:00:56,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:04,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1399.42651 ± 1016.208
2025-08-07 03:01:04,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [568.8237, 2497.8462, 580.731, 570.994, 3047.6636, 1104.7126, 3099.8096, 573.0488, 565.2412, 1385.3955]
2025-08-07 03:01:04,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [227.0, 815.0, 230.0, 222.0, 1000.0, 390.0, 1000.0, 228.0, 224.0, 449.0]
2025-08-07 03:01:04,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 44 minutes, 2 seconds)
2025-08-07 03:02:45,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:48,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 425.90845 ± 389.496
2025-08-07 03:02:48,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [945.6898, 173.59598, 167.36568, 1084.3649, 165.72519, 170.21606, 190.93126, 1026.5682, 165.90831, 168.71936]
2025-08-07 03:02:48,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [292.0, 101.0, 99.0, 346.0, 99.0, 100.0, 107.0, 323.0, 99.0, 100.0]
2025-08-07 03:02:48,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 34 seconds)
2025-08-07 03:04:30,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 874.15088 ± 563.451
2025-08-07 03:04:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [474.90726, 494.00708, 863.46277, 410.19202, 472.18478, 1175.3607, 2365.9805, 1032.8451, 959.4409, 493.12784]
2025-08-07 03:04:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 203.0, 302.0, 176.0, 194.0, 380.0, 766.0, 354.0, 355.0, 202.0]
2025-08-07 03:04:35,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 15 seconds)
2025-08-07 03:06:10,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:06:15,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 938.64874 ± 582.778
2025-08-07 03:06:15,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [746.4379, 740.9119, 759.357, 740.06775, 467.2544, 716.6865, 1087.947, 755.3372, 2635.8516, 736.6358]
2025-08-07 03:06:15,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [278.0, 279.0, 283.0, 274.0, 195.0, 267.0, 371.0, 282.0, 867.0, 274.0]
2025-08-07 03:06:15,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 17 seconds)
2025-08-07 03:07:57,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:03,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1076.00500 ± 603.575
2025-08-07 03:08:03,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2733.0676, 760.8346, 753.81854, 1535.6063, 747.2981, 774.95496, 1141.1672, 759.4839, 796.51215, 757.30695]
2025-08-07 03:08:03,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [890.0, 274.0, 261.0, 505.0, 261.0, 275.0, 357.0, 260.0, 279.0, 260.0]
2025-08-07 03:08:03,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 13 seconds)
2025-08-07 03:09:44,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:57,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2486.19922 ± 649.356
2025-08-07 03:09:57,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1803.7255, 3016.4255, 1295.664, 1807.6367, 3097.712, 3094.948, 3089.045, 3100.987, 2375.989, 2179.8596]
2025-08-07 03:09:57,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [587.0, 974.0, 462.0, 585.0, 1000.0, 1000.0, 1000.0, 1000.0, 789.0, 728.0]
2025-08-07 03:09:57,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 44 seconds)
2025-08-07 03:11:42,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1142.80823 ± 843.077
2025-08-07 03:11:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2031.601, 1176.3854, 492.66333, 656.2386, 1536.7146, 491.03384, 450.78116, 465.5211, 3169.0283, 958.11505]
2025-08-07 03:11:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [635.0, 369.0, 199.0, 246.0, 496.0, 199.0, 188.0, 192.0, 1000.0, 334.0]
2025-08-07 03:11:48,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 22 seconds)
2025-08-07 03:13:25,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:31,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1123.20007 ± 376.902
2025-08-07 03:13:31,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [940.25916, 848.14325, 1830.8252, 1014.7439, 908.52563, 1834.3271, 843.904, 851.49274, 865.9268, 1293.8531]
2025-08-07 03:13:31,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [309.0, 296.0, 620.0, 346.0, 300.0, 613.0, 288.0, 292.0, 294.0, 429.0]
2025-08-07 03:13:31,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 22 seconds)
2025-08-07 03:15:10,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:20,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1922.34399 ± 1051.579
2025-08-07 03:15:20,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3166.9263, 3187.4966, 3169.527, 1566.5743, 978.64685, 3035.4531, 1561.701, 267.5853, 1046.1902, 1243.3402]
2025-08-07 03:15:20,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 529.0, 358.0, 1000.0, 524.0, 132.0, 383.0, 424.0]
2025-08-07 03:15:20,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 3 seconds)
2025-08-07 03:17:10,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:21,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2274.97070 ± 446.089
2025-08-07 03:17:21,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2538.149, 1628.7849, 2955.398, 1867.532, 2677.4775, 1594.3424, 2542.6782, 2136.1528, 2155.2617, 2653.9312]
2025-08-07 03:17:21,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [799.0, 496.0, 901.0, 587.0, 810.0, 502.0, 765.0, 652.0, 658.0, 812.0]
2025-08-07 03:17:21,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 54 seconds)
2025-08-07 03:18:57,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:02,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1068.65674 ± 290.845
2025-08-07 03:19:02,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [739.796, 1546.2137, 737.92523, 1029.6049, 1544.7185, 1304.55, 1015.4904, 977.9877, 744.7266, 1045.555]
2025-08-07 03:19:02,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [266.0, 509.0, 269.0, 326.0, 466.0, 442.0, 348.0, 317.0, 271.0, 357.0]
2025-08-07 03:19:02,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 26 seconds)
2025-08-07 03:20:41,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:43,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 297.86700 ± 452.050
2025-08-07 03:20:43,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [145.34567, 149.46541, 144.47833, 1653.9811, 145.26755, 154.35378, 143.74657, 149.27382, 149.22934, 143.52846]
2025-08-07 03:20:43,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 93.0, 91.0, 497.0, 92.0, 95.0, 91.0, 93.0, 93.0, 91.0]
2025-08-07 03:20:43,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 12 seconds)
2025-08-07 03:22:23,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:32,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1911.12659 ± 503.805
2025-08-07 03:22:32,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1662.7842, 2169.804, 1914.5928, 3155.9075, 1632.8627, 2185.7695, 1347.5074, 1298.0057, 1944.8638, 1799.1671]
2025-08-07 03:22:32,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [538.0, 646.0, 624.0, 1000.0, 539.0, 676.0, 454.0, 438.0, 610.0, 566.0]
2025-08-07 03:22:32,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 38 seconds)
2025-08-07 03:24:20,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:34,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2668.14404 ± 711.839
2025-08-07 03:24:34,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1351.3813, 3146.6946, 3096.8823, 1278.4562, 3119.561, 3088.5957, 3076.9807, 2363.0388, 3011.6963, 3148.1538]
2025-08-07 03:24:34,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [415.0, 1000.0, 1000.0, 391.0, 1000.0, 1000.0, 1000.0, 730.0, 1000.0, 1000.0]
2025-08-07 03:24:34,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2668.14) for latency ExtremeSparseL4U32
2025-08-07 03:24:34,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 16 seconds)
2025-08-07 03:26:08,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:17,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1566.20386 ± 1002.097
2025-08-07 03:26:17,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [408.2061, 2503.1875, 1042.0714, 3120.6697, 475.5995, 2193.551, 901.42896, 3102.4731, 903.7499, 1011.1015]
2025-08-07 03:26:17,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 767.0, 351.0, 976.0, 197.0, 690.0, 319.0, 1000.0, 322.0, 368.0]
2025-08-07 03:26:17,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 50 seconds)
2025-08-07 03:27:57,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1333.48206 ± 990.576
2025-08-07 03:28:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [310.76987, 820.3404, 310.33884, 313.833, 2512.7146, 303.1561, 1883.039, 2248.0693, 1647.6165, 2984.9434]
2025-08-07 03:28:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 307.0, 145.0, 146.0, 847.0, 144.0, 661.0, 761.0, 570.0, 1000.0]
2025-08-07 03:28:05,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 16 seconds)
2025-08-07 03:29:48,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2305.25317 ± 607.440
2025-08-07 03:30:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1832.4442, 2611.6362, 1803.4161, 2151.1372, 2980.7607, 1978.617, 2107.065, 1279.4753, 3125.5723, 3182.409]
2025-08-07 03:30:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [582.0, 862.0, 574.0, 722.0, 939.0, 626.0, 680.0, 448.0, 1000.0, 1000.0]
2025-08-07 03:30:00,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 51 seconds)
2025-08-07 03:31:42,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:50,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1564.90894 ± 1160.478
2025-08-07 03:31:50,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [473.92426, 484.8587, 2620.9917, 474.67627, 477.57864, 470.35614, 2962.4497, 2997.246, 1566.0249, 3120.982]
2025-08-07 03:31:50,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 198.0, 828.0, 195.0, 194.0, 194.0, 1000.0, 1000.0, 517.0, 1000.0]
2025-08-07 03:31:50,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes)
2025-08-07 03:33:33,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:33:44,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2165.65967 ± 1006.531
2025-08-07 03:33:44,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1020.55853, 2027.4943, 3240.8867, 3151.5156, 2644.421, 3122.7852, 3122.1782, 1396.4504, 1719.5908, 210.71883]
2025-08-07 03:33:44,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [357.0, 653.0, 1000.0, 1000.0, 794.0, 1000.0, 1000.0, 477.0, 563.0, 114.0]
2025-08-07 03:33:44,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 1 second)
2025-08-07 03:35:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:28,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1377.72485 ± 751.068
2025-08-07 03:35:28,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2235.309, 392.98962, 913.4975, 951.1878, 1152.1099, 3172.5676, 1303.4949, 899.469, 1504.4774, 1252.1458]
2025-08-07 03:35:28,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [694.0, 171.0, 340.0, 342.0, 399.0, 1000.0, 421.0, 335.0, 512.0, 414.0]
2025-08-07 03:35:28,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 11 seconds)
2025-08-07 03:37:09,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:22,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2580.46680 ± 872.230
2025-08-07 03:37:22,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1194.4634, 3196.1665, 3166.1572, 3029.387, 1214.2579, 1344.8475, 3134.4836, 3145.0916, 3164.0034, 3215.808]
2025-08-07 03:37:22,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [367.0, 1000.0, 1000.0, 1000.0, 365.0, 424.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:37:22,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 26 seconds)
2025-08-07 03:39:08,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2563.42480 ± 940.987
2025-08-07 03:39:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3114.2202, 477.8214, 3208.9834, 3105.024, 3189.4285, 1759.2325, 3109.474, 1358.622, 3129.08, 3182.3616]
2025-08-07 03:39:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 192.0, 1000.0, 1000.0, 1000.0, 534.0, 1000.0, 463.0, 1000.0, 1000.0]
2025-08-07 03:39:21,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 36 seconds)
2025-08-07 03:40:59,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:04,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 819.47424 ± 545.387
2025-08-07 03:41:04,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1946.2018, 395.64966, 1282.0151, 395.22565, 1476.245, 467.565, 401.25674, 1034.3851, 398.7888, 397.4092]
2025-08-07 03:41:04,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [620.0, 169.0, 446.0, 169.0, 483.0, 190.0, 168.0, 354.0, 169.0, 170.0]
2025-08-07 03:41:04,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 41 seconds)
2025-08-07 03:42:53,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:05,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2223.10181 ± 1055.710
2025-08-07 03:43:05,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3068.886, 3134.9663, 3067.859, 3047.9592, 720.94525, 3059.7795, 1303.0282, 728.45184, 1023.13684, 3076.0051]
2025-08-07 03:43:05,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 271.0, 1000.0, 460.0, 274.0, 371.0, 1000.0]
2025-08-07 03:43:05,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 52 seconds)
2025-08-07 03:44:41,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:53,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2305.32959 ± 877.580
2025-08-07 03:44:53,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1313.5609, 1375.0686, 767.1389, 3187.4763, 1839.9303, 2216.1992, 3049.829, 3176.1719, 3164.0469, 2963.8757]
2025-08-07 03:44:53,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [430.0, 447.0, 281.0, 1000.0, 578.0, 710.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:44:53,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1251 [DEBUG]: Training session finished
