2025-09-16 09:01:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_12
2025-09-16 09:01:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_12
2025-09-16 09:01:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x150975528850>}
2025-09-16 09:01:28,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-16 09:01:28,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-16 09:01:28,462 baseline-bpql-noisepromille200-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=89, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-16 09:01:28,462 baseline-bpql-noisepromille200-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 09:01:29,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-16 09:01:29,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-16 09:03:01,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:03:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -361.63672 ± 50.408
2025-09-16 09:03:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [-446.05936, -401.74866, -367.7957, -336.7122, -302.811, -385.22244, -381.93173, -268.54037, -324.59003, -400.9557]
2025-09-16 09:03:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:03:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-361.64) for latency 12
2025-09-16 09:03:11,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 48 minutes, 26 seconds)
2025-09-16 09:04:47,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:04:57,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -288.26413 ± 42.575
2025-09-16 09:04:57,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [-303.762, -367.7562, -304.3356, -306.54147, -216.30373, -297.23193, -255.17224, -329.40912, -260.0257, -242.10342]
2025-09-16 09:04:57,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:04:57,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-288.26) for latency 12
2025-09-16 09:04:57,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 50 minutes, 17 seconds)
2025-09-16 09:06:34,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:06:44,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -186.20746 ± 97.206
2025-09-16 09:06:44,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [-41.691273, -301.43115, -215.68246, -284.24075, -218.06898, -292.1543, -240.00743, -72.49002, -154.17448, -42.133804]
2025-09-16 09:06:44,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:06:44,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-186.21) for latency 12
2025-09-16 09:06:44,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 49 minutes, 50 seconds)
2025-09-16 09:08:21,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:08:31,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -88.09923 ± 80.111
2025-09-16 09:08:31,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [40.815063, 28.9508, -177.31097, -54.417328, -58.04184, -63.211582, -80.170975, -138.06682, -187.75313, -191.78554]
2025-09-16 09:08:31,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:08:31,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-88.10) for latency 12
2025-09-16 09:08:31,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 48 minutes, 42 seconds)
2025-09-16 09:10:07,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:10:17,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -35.85735 ± 94.612
2025-09-16 09:10:17,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [103.71867, -120.70634, -137.90251, -0.54058087, 90.72278, -7.2719784, -125.26602, 41.868984, -25.262373, -177.93411]
2025-09-16 09:10:17,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:10:17,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-35.86) for latency 12
2025-09-16 09:10:17,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 47 minutes, 18 seconds)
2025-09-16 09:11:53,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:12:04,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -55.02696 ± 136.782
2025-09-16 09:12:04,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [32.053684, -377.68427, -101.485245, -35.29081, -84.08321, -64.15027, 101.37703, 73.87218, -171.40503, 76.52637]
2025-09-16 09:12:04,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:12:04,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 46 minutes, 55 seconds)
2025-09-16 09:13:40,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:13:50,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 93.96763 ± 86.555
2025-09-16 09:13:50,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [47.111305, 104.42137, -15.686719, 135.01039, -25.36319, 160.47108, 86.36233, 251.94704, 184.87514, 10.527584]
2025-09-16 09:13:50,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:13:50,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (93.97) for latency 12
2025-09-16 09:13:50,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 45 minutes, 6 seconds)
2025-09-16 09:15:27,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:15:37,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 207.84827 ± 122.906
2025-09-16 09:15:37,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [107.87391, 318.83084, 192.00874, 318.18893, 331.40192, 201.76608, 78.17996, 70.50737, 409.26923, 50.455463]
2025-09-16 09:15:37,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:15:37,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (207.85) for latency 12
2025-09-16 09:15:37,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 43 minutes, 19 seconds)
2025-09-16 09:17:13,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:17:23,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 320.02127 ± 237.287
2025-09-16 09:17:23,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [-302.5066, 579.2518, 261.90155, 347.91803, 440.75745, 537.2407, 430.95, 379.97037, 162.75592, 361.9734]
2025-09-16 09:17:23,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:17:23,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (320.02) for latency 12
2025-09-16 09:17:24,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 41 minutes, 39 seconds)
2025-09-16 09:19:00,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:19:10,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 474.36407 ± 113.954
2025-09-16 09:19:10,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [429.871, 421.03537, 556.4085, 419.19113, 337.74652, 587.1674, 576.92865, 326.27942, 400.31924, 688.6935]
2025-09-16 09:19:10,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:19:10,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (474.36) for latency 12
2025-09-16 09:19:10,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 39 minutes, 50 seconds)
2025-09-16 09:20:46,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:20:56,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 502.24072 ± 245.526
2025-09-16 09:20:56,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [585.7739, 622.2064, 628.7806, -202.1293, 721.7098, 437.1543, 606.76733, 560.65845, 552.45624, 509.02957]
2025-09-16 09:20:56,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:20:56,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (502.24) for latency 12
2025-09-16 09:20:56,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 38 minutes, 3 seconds)
2025-09-16 09:22:33,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:22:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 573.08380 ± 315.888
2025-09-16 09:22:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [733.2302, 638.0755, 776.84186, 549.409, 713.2749, 544.08295, 649.56946, 790.90375, 679.3315, -343.88123]
2025-09-16 09:22:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:22:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (573.08) for latency 12
2025-09-16 09:22:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 36 minutes, 22 seconds)
2025-09-16 09:24:19,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:24:30,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 749.35675 ± 127.333
2025-09-16 09:24:30,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [605.6725, 876.3437, 924.41156, 859.9288, 717.5853, 817.6398, 698.20105, 673.66284, 822.1258, 497.99588]
2025-09-16 09:24:30,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:24:30,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (749.36) for latency 12
2025-09-16 09:24:30,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 34 minutes, 36 seconds)
2025-09-16 09:26:06,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:26:16,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 703.17181 ± 230.166
2025-09-16 09:26:16,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [825.7859, 664.56525, 828.85736, 881.3736, 609.7611, 771.40283, 1072.2152, 163.0825, 552.34985, 662.3245]
2025-09-16 09:26:16,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:26:16,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 32 minutes, 42 seconds)
2025-09-16 09:27:53,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:28:02,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 671.07556 ± 120.395
2025-09-16 09:28:02,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [841.0593, 839.0675, 643.5738, 754.19965, 560.096, 667.9659, 773.35834, 620.0497, 511.80072, 499.58502]
2025-09-16 09:28:02,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:28:02,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 30 minutes, 52 seconds)
2025-09-16 09:29:39,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:29:49,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 790.09070 ± 144.951
2025-09-16 09:29:49,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1029.5297, 679.9537, 792.504, 988.092, 717.9533, 741.55927, 565.7766, 627.91095, 906.3456, 851.28235]
2025-09-16 09:29:49,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:29:49,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (790.09) for latency 12
2025-09-16 09:29:49,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 29 minutes, 6 seconds)
2025-09-16 09:31:25,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:31:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 866.79431 ± 133.804
2025-09-16 09:31:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [865.63983, 704.473, 1002.0963, 1118.1425, 1015.6535, 811.01056, 815.7088, 673.2828, 790.3007, 871.63544]
2025-09-16 09:31:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:31:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (866.79) for latency 12
2025-09-16 09:31:35,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 27 minutes, 16 seconds)
2025-09-16 09:33:12,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:33:22,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 958.69452 ± 118.700
2025-09-16 09:33:22,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1047.651, 909.6728, 988.98846, 908.2976, 972.5492, 1023.4193, 711.72455, 837.84894, 1019.7258, 1167.067]
2025-09-16 09:33:22,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:33:22,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (958.69) for latency 12
2025-09-16 09:33:22,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 25 minutes, 22 seconds)
2025-09-16 09:34:58,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:35:08,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1167.33118 ± 187.660
2025-09-16 09:35:08,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1214.267, 1375.6521, 1096.3226, 1525.8844, 810.41547, 1160.1373, 1152.6421, 1268.1125, 1041.7035, 1028.1752]
2025-09-16 09:35:08,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:35:08,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1167.33) for latency 12
2025-09-16 09:35:08,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 23 minutes, 30 seconds)
2025-09-16 09:36:44,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:36:54,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1128.78735 ± 186.526
2025-09-16 09:36:54,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1186.4739, 1152.9436, 932.2596, 1322.9806, 1047.5344, 1199.3805, 1170.0275, 1310.2661, 1282.1249, 683.88226]
2025-09-16 09:36:54,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:36:54,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 21 minutes, 41 seconds)
2025-09-16 09:38:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:38:40,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1052.78845 ± 146.213
2025-09-16 09:38:40,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1056.8616, 1015.7697, 1128.7523, 939.3021, 1042.0889, 1274.3846, 784.3958, 894.44836, 1252.6609, 1139.2206]
2025-09-16 09:38:40,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:38:40,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 19 minutes, 54 seconds)
2025-09-16 09:40:17,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:40:27,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1173.93164 ± 121.751
2025-09-16 09:40:27,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1210.2576, 1112.493, 1078.2921, 1178.4542, 928.8557, 1260.6636, 1187.2985, 1429.763, 1195.9076, 1157.3306]
2025-09-16 09:40:27,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:40:27,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1173.93) for latency 12
2025-09-16 09:40:27,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 18 minutes, 8 seconds)
2025-09-16 09:42:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:42:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1013.38611 ± 107.303
2025-09-16 09:42:13,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [898.5244, 1086.091, 1095.0487, 1068.9562, 897.91425, 975.6426, 1115.2258, 802.92365, 1054.4905, 1139.0447]
2025-09-16 09:42:13,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:42:13,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 16 minutes, 23 seconds)
2025-09-16 09:43:49,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:43:59,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1174.58887 ± 117.650
2025-09-16 09:43:59,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1022.5424, 1243.1951, 1078.8629, 1199.7622, 1165.6941, 1084.594, 1235.7931, 1366.6603, 1013.18384, 1335.6012]
2025-09-16 09:43:59,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:43:59,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1174.59) for latency 12
2025-09-16 09:43:59,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 14 minutes, 42 seconds)
2025-09-16 09:45:36,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:45:46,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1112.16907 ± 139.190
2025-09-16 09:45:46,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1061.2194, 872.24164, 1021.28455, 1325.0602, 1009.6094, 1365.6404, 1122.3269, 1159.8109, 1124.3589, 1060.1375]
2025-09-16 09:45:46,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:45:46,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 13 minutes, 3 seconds)
2025-09-16 09:47:23,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:47:33,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1044.58435 ± 142.342
2025-09-16 09:47:33,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1098.2881, 827.17395, 1260.4658, 875.1108, 1024.8708, 1250.0538, 1122.3698, 1079.3448, 880.3016, 1027.8643]
2025-09-16 09:47:33,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:47:33,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 11 minutes, 18 seconds)
2025-09-16 09:49:09,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:49:19,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1236.70386 ± 145.304
2025-09-16 09:49:19,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1121.7566, 1224.7461, 1139.7025, 1359.081, 1052.3873, 1434.2236, 1067.8955, 1245.9011, 1504.0055, 1217.339]
2025-09-16 09:49:19,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:49:19,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1236.70) for latency 12
2025-09-16 09:49:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 9 minutes, 30 seconds)
2025-09-16 09:50:55,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:51:05,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1130.67798 ± 168.766
2025-09-16 09:51:05,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [866.6183, 991.99475, 939.4339, 1347.7068, 1152.0073, 1218.7025, 1409.6906, 1048.0721, 1257.8568, 1074.6958]
2025-09-16 09:51:05,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:51:05,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 7 minutes, 46 seconds)
2025-09-16 09:52:42,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:52:52,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1009.03845 ± 158.796
2025-09-16 09:52:52,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [955.136, 1042.1216, 1120.6157, 1118.1907, 806.2407, 1313.958, 924.17786, 1000.91815, 1078.5503, 730.4763]
2025-09-16 09:52:52,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:52:52,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 5 minutes, 55 seconds)
2025-09-16 09:54:28,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:54:38,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1195.91345 ± 173.900
2025-09-16 09:54:38,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1207.316, 1005.6179, 1519.8301, 1255.1437, 1405.984, 1227.4952, 1256.3405, 1060.6388, 1114.593, 906.17615]
2025-09-16 09:54:38,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:54:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 4 minutes, 3 seconds)
2025-09-16 09:56:14,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:56:24,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1377.26196 ± 178.452
2025-09-16 09:56:24,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1347.1917, 1396.2457, 1329.0391, 1101.0157, 1642.825, 1419.502, 1327.4779, 1667.8303, 1439.0333, 1102.46]
2025-09-16 09:56:24,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:56:24,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1377.26) for latency 12
2025-09-16 09:56:24,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 2 minutes, 13 seconds)
2025-09-16 09:58:01,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:58:11,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1302.13354 ± 237.891
2025-09-16 09:58:11,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1356.0848, 1855.179, 1141.4174, 1132.1029, 1536.9358, 1049.2699, 1083.7002, 1397.6259, 1141.3413, 1327.6776]
2025-09-16 09:58:11,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:58:11,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 29 seconds)
2025-09-16 09:59:47,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 09:59:57,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1231.11462 ± 220.614
2025-09-16 09:59:57,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1082.013, 862.5026, 1603.8862, 1286.4741, 1426.6544, 1151.5116, 1518.4023, 1116.9147, 1016.20026, 1246.5873]
2025-09-16 09:59:57,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:59:57,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 58 minutes, 43 seconds)
2025-09-16 10:01:33,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:01:43,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1355.67932 ± 211.926
2025-09-16 10:01:43,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1550.514, 1271.8384, 1119.2156, 1408.3995, 1634.6365, 1485.1671, 1300.626, 916.2703, 1289.8986, 1580.2269]
2025-09-16 10:01:43,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:01:43,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 56 minutes, 58 seconds)
2025-09-16 10:03:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:03:30,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1431.33618 ± 173.688
2025-09-16 10:03:30,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1566.2843, 1332.6829, 1057.902, 1474.3911, 1685.7263, 1337.7208, 1341.1328, 1642.5753, 1382.1932, 1492.752]
2025-09-16 10:03:30,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:03:30,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1431.34) for latency 12
2025-09-16 10:03:30,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 55 minutes, 16 seconds)
2025-09-16 10:05:06,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:05:16,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1419.20715 ± 149.358
2025-09-16 10:05:16,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1399.4249, 1300.4553, 1746.0697, 1247.1631, 1452.6157, 1425.5427, 1265.3649, 1612.4309, 1423.643, 1319.3617]
2025-09-16 10:05:16,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:05:16,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 53 minutes, 30 seconds)
2025-09-16 10:06:53,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:07:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1448.50732 ± 92.870
2025-09-16 10:07:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1394.3041, 1241.7744, 1430.8823, 1427.3369, 1627.1892, 1439.5155, 1500.2821, 1496.372, 1429.277, 1498.14]
2025-09-16 10:07:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:07:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1448.51) for latency 12
2025-09-16 10:07:03,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 51 minutes, 42 seconds)
2025-09-16 10:08:39,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:08:49,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1537.34241 ± 161.515
2025-09-16 10:08:49,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1540.8519, 1297.9935, 1576.284, 1675.7511, 1321.7235, 1801.7383, 1458.2554, 1598.4191, 1718.7305, 1383.676]
2025-09-16 10:08:49,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:08:49,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1537.34) for latency 12
2025-09-16 10:08:49,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 49 minutes, 57 seconds)
2025-09-16 10:10:25,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:10:35,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1363.25647 ± 154.348
2025-09-16 10:10:35,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1425.2914, 1610.8146, 1388.3542, 1528.9651, 1331.7618, 1253.3145, 1210.4778, 1138.2634, 1202.6727, 1542.6499]
2025-09-16 10:10:35,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:10:35,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 48 minutes, 12 seconds)
2025-09-16 10:12:12,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:12:22,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1347.92798 ± 166.045
2025-09-16 10:12:22,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1286.666, 1242.7571, 1567.9954, 1116.3497, 1450.5359, 1648.4683, 1289.9797, 1123.6538, 1339.6688, 1413.204]
2025-09-16 10:12:22,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:12:22,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 46 minutes, 24 seconds)
2025-09-16 10:13:58,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:14:08,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1426.81653 ± 214.042
2025-09-16 10:14:08,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1218.1257, 1371.1815, 1474.0454, 1847.0203, 1482.5981, 1404.2919, 1556.0348, 1441.3368, 972.2592, 1501.2712]
2025-09-16 10:14:08,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:14:08,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 44 minutes, 37 seconds)
2025-09-16 10:15:44,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:15:54,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1354.88086 ± 161.512
2025-09-16 10:15:54,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1335.787, 1305.1951, 1284.1122, 1166.7487, 1271.7842, 1143.8674, 1417.668, 1362.4639, 1701.8713, 1559.312]
2025-09-16 10:15:54,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:15:54,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 42 minutes, 50 seconds)
2025-09-16 10:17:30,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:17:40,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1548.18311 ± 131.119
2025-09-16 10:17:40,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1344.2019, 1615.9988, 1619.0184, 1508.7139, 1609.254, 1418.9849, 1347.1655, 1698.9379, 1586.7146, 1732.8406]
2025-09-16 10:17:40,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:17:40,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1548.18) for latency 12
2025-09-16 10:17:40,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 40 minutes, 58 seconds)
2025-09-16 10:19:17,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:19:27,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1360.71313 ± 197.582
2025-09-16 10:19:27,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1404.4364, 1070.4364, 1234.5254, 1806.3251, 1127.5433, 1296.8121, 1375.3354, 1481.2902, 1484.4429, 1325.982]
2025-09-16 10:19:27,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:19:27,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 39 minutes, 10 seconds)
2025-09-16 10:21:03,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:21:12,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1482.90002 ± 151.407
2025-09-16 10:21:12,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1372.0073, 1312.2513, 1539.3192, 1646.6439, 1476.1772, 1656.4901, 1721.9344, 1236.4341, 1483.6667, 1384.0751]
2025-09-16 10:21:12,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:21:13,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 37 minutes, 17 seconds)
2025-09-16 10:22:48,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:22:58,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1419.84790 ± 149.380
2025-09-16 10:22:58,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1523.2764, 1566.5702, 1539.958, 1276.1094, 1363.1576, 1378.4404, 1082.4442, 1393.3363, 1471.8993, 1603.2863]
2025-09-16 10:22:58,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:22:58,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 35 minutes, 19 seconds)
2025-09-16 10:24:33,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:24:43,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1464.04358 ± 119.505
2025-09-16 10:24:43,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1570.3873, 1342.6825, 1675.1294, 1518.9572, 1310.4302, 1342.295, 1322.1534, 1538.0121, 1488.7446, 1531.6443]
2025-09-16 10:24:43,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:24:43,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 33 minutes, 18 seconds)
2025-09-16 10:26:17,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:26:27,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1555.22595 ± 151.672
2025-09-16 10:26:27,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1353.4622, 1646.8383, 1425.0775, 1539.789, 1412.5863, 1511.6931, 1531.4407, 1728.3539, 1886.1567, 1516.8632]
2025-09-16 10:26:27,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:26:27,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1555.23) for latency 12
2025-09-16 10:26:27,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 31 minutes, 17 seconds)
2025-09-16 10:28:02,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:28:11,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1599.91699 ± 117.202
2025-09-16 10:28:11,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1561.4197, 1715.1482, 1535.7, 1518.0896, 1758.4874, 1766.2283, 1414.7812, 1567.7098, 1473.7147, 1687.8911]
2025-09-16 10:28:11,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:28:11,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1599.92) for latency 12
2025-09-16 10:28:11,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 29 minutes, 11 seconds)
2025-09-16 10:29:46,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:29:56,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1348.63501 ± 127.416
2025-09-16 10:29:56,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1402.1505, 1108.2402, 1238.0618, 1266.8446, 1311.145, 1452.0955, 1362.1678, 1293.0778, 1572.5521, 1480.0155]
2025-09-16 10:29:56,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:29:56,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 27 minutes, 11 seconds)
2025-09-16 10:31:30,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:31:40,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1518.85034 ± 159.734
2025-09-16 10:31:40,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1274.8518, 1532.6282, 1390.2347, 1600.6635, 1371.5858, 1412.3193, 1557.9312, 1740.5345, 1492.6047, 1815.1504]
2025-09-16 10:31:40,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:31:40,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 17 seconds)
2025-09-16 10:33:14,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:33:24,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1459.65161 ± 164.167
2025-09-16 10:33:24,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1589.4733, 1446.9261, 1158.1099, 1496.0773, 1548.698, 1718.3572, 1338.0698, 1477.655, 1592.7538, 1230.3947]
2025-09-16 10:33:24,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:33:24,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 24 seconds)
2025-09-16 10:34:58,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:35:08,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1430.84155 ± 189.875
2025-09-16 10:35:08,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1455.6001, 1741.0468, 1460.4752, 1507.7317, 1453.3973, 1595.46, 992.9621, 1267.2877, 1345.0248, 1489.4305]
2025-09-16 10:35:08,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:35:08,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 35 seconds)
2025-09-16 10:36:42,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:36:52,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1553.45410 ± 147.400
2025-09-16 10:36:52,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1698.2848, 1531.7948, 1470.968, 1454.6007, 1683.3861, 1547.2125, 1442.6332, 1388.826, 1433.7988, 1883.036]
2025-09-16 10:36:52,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:36:52,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 49 seconds)
2025-09-16 10:38:26,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:38:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1473.15503 ± 80.200
2025-09-16 10:38:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1427.9553, 1586.0089, 1444.3933, 1538.4744, 1517.3289, 1411.9524, 1440.0876, 1298.9261, 1539.807, 1526.6162]
2025-09-16 10:38:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:38:36,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 18 minutes, 2 seconds)
2025-09-16 10:40:10,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:40:20,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1580.73853 ± 132.962
2025-09-16 10:40:20,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1635.1033, 1505.8749, 1397.599, 1664.4011, 1607.812, 1666.5243, 1316.5004, 1652.2379, 1569.5571, 1791.7762]
2025-09-16 10:40:20,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:40:20,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 14 seconds)
2025-09-16 10:41:54,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:42:03,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1650.27368 ± 134.379
2025-09-16 10:42:03,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1826.6572, 1400.6926, 1656.3011, 1682.5503, 1554.0919, 1691.7809, 1868.8926, 1703.5743, 1496.3978, 1621.7972]
2025-09-16 10:42:03,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:42:03,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1650.27) for latency 12
2025-09-16 10:42:03,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 14 minutes, 26 seconds)
2025-09-16 10:43:37,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:43:47,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1606.43347 ± 161.812
2025-09-16 10:43:47,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1734.4939, 1885.5026, 1460.7642, 1815.6106, 1413.1892, 1525.7, 1572.2131, 1533.0365, 1406.6794, 1717.1458]
2025-09-16 10:43:47,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:43:47,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 12 minutes, 40 seconds)
2025-09-16 10:45:21,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:45:31,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1624.26721 ± 102.078
2025-09-16 10:45:31,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1493.4172, 1670.3467, 1647.2352, 1539.4904, 1450.8615, 1636.2136, 1740.7017, 1649.1244, 1610.2996, 1804.9819]
2025-09-16 10:45:31,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:45:31,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 10 minutes, 52 seconds)
2025-09-16 10:47:04,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:47:14,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1713.68433 ± 174.301
2025-09-16 10:47:14,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1765.6742, 1619.3472, 1836.7281, 1767.8451, 1549.4852, 1912.5975, 1297.957, 1757.063, 1736.5404, 1893.6074]
2025-09-16 10:47:14,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:47:14,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1713.68) for latency 12
2025-09-16 10:47:14,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 3 seconds)
2025-09-16 10:48:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:48:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1620.80273 ± 312.214
2025-09-16 10:48:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1460.6671, 1769.3553, 1771.6843, 1737.0945, 1612.9254, 1707.9293, 1660.1471, 1900.5707, 1836.7262, 750.92804]
2025-09-16 10:48:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:48:57,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 15 seconds)
2025-09-16 10:50:30,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:50:40,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1621.95337 ± 174.516
2025-09-16 10:50:40,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1600.4398, 1935.2808, 1511.4714, 1672.1934, 1699.0966, 1896.0228, 1514.4253, 1369.2432, 1440.1849, 1581.176]
2025-09-16 10:50:40,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:50:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 28 seconds)
2025-09-16 10:52:13,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:52:23,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1622.36621 ± 186.234
2025-09-16 10:52:23,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1605.3716, 1407.1406, 1648.3699, 1744.2026, 2106.6216, 1597.4115, 1541.1608, 1454.4149, 1500.897, 1618.0721]
2025-09-16 10:52:23,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:52:23,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 39 seconds)
2025-09-16 10:53:57,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:54:07,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1781.91479 ± 104.490
2025-09-16 10:54:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1819.7892, 1916.436, 1759.582, 1541.7457, 1688.5984, 1811.2926, 1739.1785, 1797.3796, 1831.3646, 1913.7812]
2025-09-16 10:54:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:54:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1781.91) for latency 12
2025-09-16 10:54:07,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 55 seconds)
2025-09-16 10:55:40,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:55:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1759.39294 ± 78.769
2025-09-16 10:55:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1660.3035, 1865.8324, 1836.8376, 1803.0936, 1823.8522, 1738.7926, 1799.7526, 1669.1484, 1623.532, 1772.7838]
2025-09-16 10:55:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:55:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 12 seconds)
2025-09-16 10:57:24,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:57:33,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1715.82520 ± 146.777
2025-09-16 10:57:33,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1500.2892, 1664.5253, 1870.8838, 1977.4911, 1811.0939, 1666.1698, 1666.4966, 1485.8854, 1799.4183, 1715.9983]
2025-09-16 10:57:33,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:57:33,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 31 seconds)
2025-09-16 10:59:07,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 10:59:17,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1525.22095 ± 202.199
2025-09-16 10:59:17,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1453.2266, 1018.64417, 1522.0834, 1687.9379, 1602.8512, 1461.0844, 1689.8038, 1811.0819, 1485.4146, 1520.0817]
2025-09-16 10:59:17,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:59:17,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 52 seconds)
2025-09-16 11:00:51,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:01:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1674.90747 ± 101.405
2025-09-16 11:01:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1662.2805, 1731.4603, 1479.8488, 1557.8671, 1711.2906, 1810.937, 1798.52, 1576.9631, 1726.806, 1693.1013]
2025-09-16 11:01:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:01:01,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 14 seconds)
2025-09-16 11:02:35,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:02:44,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1716.86719 ± 150.767
2025-09-16 11:02:44,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1801.0349, 1508.923, 1854.9474, 1598.179, 1926.6766, 1588.4066, 1725.446, 1949.4141, 1555.9949, 1659.65]
2025-09-16 11:02:44,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:02:44,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 30 seconds)
2025-09-16 11:04:18,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:04:27,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1738.57007 ± 157.039
2025-09-16 11:04:27,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1667.9768, 1662.3228, 1703.6283, 1577.3219, 1766.4155, 1564.6221, 1958.5402, 2080.0732, 1631.9308, 1772.8693]
2025-09-16 11:04:27,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:04:27,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 44 seconds)
2025-09-16 11:06:01,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:06:10,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1558.16919 ± 184.960
2025-09-16 11:06:10,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1451.5574, 1552.8491, 1157.2606, 1622.6577, 1673.7886, 1388.3455, 1664.2445, 1823.5582, 1494.6057, 1752.824]
2025-09-16 11:06:10,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:06:10,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 58 seconds)
2025-09-16 11:07:44,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:07:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1658.65820 ± 140.779
2025-09-16 11:07:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1684.2129, 1886.1326, 1596.2311, 1816.4426, 1653.6115, 1770.5876, 1740.2291, 1509.3746, 1479.0996, 1450.6593]
2025-09-16 11:07:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:07:54,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 13 seconds)
2025-09-16 11:09:28,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:09:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1646.98047 ± 174.690
2025-09-16 11:09:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1735.9009, 1768.2754, 1291.286, 1754.4236, 1705.9103, 1829.9066, 1648.2802, 1391.7434, 1528.6063, 1815.4722]
2025-09-16 11:09:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:09:37,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 28 seconds)
2025-09-16 11:11:11,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:11:21,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1589.24292 ± 150.967
2025-09-16 11:11:21,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1633.5941, 1503.5206, 1690.9816, 1700.5137, 1689.014, 1791.7212, 1676.0306, 1318.8145, 1549.2991, 1338.9395]
2025-09-16 11:11:21,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:11:21,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 44 seconds)
2025-09-16 11:12:54,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:13:04,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1837.45312 ± 125.673
2025-09-16 11:13:04,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1824.3812, 1736.1068, 2044.963, 1740.3018, 2037.971, 1672.5858, 1709.6875, 1877.8019, 1928.9941, 1801.739]
2025-09-16 11:13:04,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:13:04,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1837.45) for latency 12
2025-09-16 11:13:04,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 3 seconds)
2025-09-16 11:14:38,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:14:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1714.51599 ± 203.404
2025-09-16 11:14:47,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1495.7361, 1514.975, 1969.519, 1648.8618, 1433.2738, 1939.3047, 1961.5269, 1525.862, 1876.5452, 1779.5547]
2025-09-16 11:14:47,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:14:47,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 21 seconds)
2025-09-16 11:16:21,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:16:31,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1848.62537 ± 135.865
2025-09-16 11:16:31,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [2053.6182, 1905.1991, 1689.9125, 1696.2628, 1743.28, 1731.2227, 1884.0914, 1822.2522, 1858.7527, 2101.6628]
2025-09-16 11:16:31,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:16:31,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1848.63) for latency 12
2025-09-16 11:16:31,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 37 seconds)
2025-09-16 11:18:04,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:18:14,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1716.39685 ± 79.207
2025-09-16 11:18:14,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1853.8666, 1658.9613, 1643.8029, 1712.2662, 1671.5684, 1668.7323, 1606.5273, 1803.0664, 1722.2191, 1822.9591]
2025-09-16 11:18:14,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:18:14,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 51 seconds)
2025-09-16 11:19:47,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:19:57,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1824.25098 ± 175.263
2025-09-16 11:19:57,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1482.4368, 1703.9828, 1736.1427, 1621.4518, 2004.7373, 1921.8795, 1896.4878, 1951.2877, 1849.0536, 2075.05]
2025-09-16 11:19:57,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:19:57,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 6 seconds)
2025-09-16 11:21:30,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:21:40,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1846.01099 ± 141.463
2025-09-16 11:21:40,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1932.6105, 2016.1802, 1741.7535, 1680.5781, 1762.8402, 2110.4587, 1957.7675, 1802.9674, 1782.8383, 1672.1145]
2025-09-16 11:21:40,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:21:40,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 23 seconds)
2025-09-16 11:23:14,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:23:23,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1660.81079 ± 113.907
2025-09-16 11:23:23,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1681.3835, 1700.4802, 1823.3295, 1728.2823, 1635.8861, 1761.458, 1486.7224, 1692.0414, 1427.06, 1671.4636]
2025-09-16 11:23:23,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:23:23,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 39 seconds)
2025-09-16 11:24:57,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:25:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1796.31152 ± 183.007
2025-09-16 11:25:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1589.117, 1865.2252, 1927.7435, 2052.9424, 1912.0093, 1897.9176, 1767.6853, 1738.9778, 1833.4066, 1378.0897]
2025-09-16 11:25:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:25:07,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 57 seconds)
2025-09-16 11:26:40,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:26:50,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1768.01562 ± 104.141
2025-09-16 11:26:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1804.4707, 1830.6393, 1891.9324, 1668.5448, 1648.337, 1664.1604, 1919.148, 1622.2295, 1861.5453, 1769.1498]
2025-09-16 11:26:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:26:50,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 15 seconds)
2025-09-16 11:28:24,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:28:33,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1768.64319 ± 176.305
2025-09-16 11:28:33,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1589.273, 1756.566, 1597.183, 2009.1323, 1736.8733, 1917.4823, 1893.567, 1570.1488, 2042.6221, 1573.5844]
2025-09-16 11:28:33,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:28:33,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 33 seconds)
2025-09-16 11:30:07,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:30:17,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1827.01050 ± 200.561
2025-09-16 11:30:17,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1544.4506, 1972.949, 1557.5259, 1834.5747, 1761.7012, 1989.9818, 1803.8649, 2027.3197, 2162.0828, 1615.6572]
2025-09-16 11:30:17,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:30:17,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 50 seconds)
2025-09-16 11:31:50,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:32:00,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1775.85486 ± 136.929
2025-09-16 11:32:00,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1854.654, 1493.2762, 1562.6405, 1880.3512, 1743.6047, 1856.4097, 1744.9921, 1857.3464, 1942.3374, 1822.9368]
2025-09-16 11:32:00,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:32:00,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 6 seconds)
2025-09-16 11:33:33,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:33:43,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1607.69775 ± 315.013
2025-09-16 11:33:43,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1680.9147, 1938.8779, 757.3749, 1582.7292, 1910.8418, 1721.0131, 1716.9204, 1566.6726, 1733.4275, 1468.2054]
2025-09-16 11:33:43,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:33:43,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 22 seconds)
2025-09-16 11:35:16,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:35:26,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1881.10522 ± 149.035
2025-09-16 11:35:26,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [2109.7632, 2080.536, 1855.9006, 1661.5297, 1844.3973, 1926.5297, 1640.414, 1812.4583, 2004.0759, 1875.45]
2025-09-16 11:35:26,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:35:26,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1881.11) for latency 12
2025-09-16 11:35:26,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 38 seconds)
2025-09-16 11:37:00,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:37:09,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1690.52612 ± 78.029
2025-09-16 11:37:09,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1622.3595, 1670.7256, 1693.9578, 1748.382, 1704.985, 1555.1913, 1615.1315, 1849.272, 1745.1746, 1700.082]
2025-09-16 11:37:09,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:37:09,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 55 seconds)
2025-09-16 11:38:43,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:38:53,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1794.33105 ± 104.454
2025-09-16 11:38:53,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1720.756, 1832.1216, 1783.0499, 1877.0316, 1851.8529, 1714.2235, 1823.5499, 1953.0538, 1553.6086, 1834.0637]
2025-09-16 11:38:53,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:38:53,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 12 seconds)
2025-09-16 11:40:26,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:40:36,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1682.98413 ± 199.259
2025-09-16 11:40:36,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1816.4153, 1398.7186, 1767.0945, 1426.1757, 2003.3447, 1931.8837, 1633.4119, 1444.6642, 1685.598, 1722.5359]
2025-09-16 11:40:36,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:40:36,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 29 seconds)
2025-09-16 11:42:10,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:42:20,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1690.33984 ± 114.046
2025-09-16 11:42:20,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1584.7211, 1740.0651, 1736.2339, 1544.3373, 1702.7788, 1871.9847, 1649.5975, 1633.3958, 1883.2168, 1557.0679]
2025-09-16 11:42:20,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:42:20,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 46 seconds)
2025-09-16 11:43:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:44:03,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1725.36328 ± 142.280
2025-09-16 11:44:03,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1718.5343, 1854.6134, 1725.2098, 1436.9989, 1897.6202, 1880.0996, 1835.4698, 1598.8816, 1581.1669, 1725.0377]
2025-09-16 11:44:03,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:44:03,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 3 seconds)
2025-09-16 11:45:37,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:45:46,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1767.96606 ± 100.911
2025-09-16 11:45:46,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1972.2742, 1660.4033, 1868.6199, 1802.2043, 1849.0049, 1747.9347, 1698.5392, 1672.2548, 1767.9965, 1640.4281]
2025-09-16 11:45:46,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:45:46,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 20 seconds)
2025-09-16 11:47:20,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:47:30,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1807.14087 ± 111.361
2025-09-16 11:47:30,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1879.9734, 1888.9688, 1888.679, 1809.2162, 1605.7262, 1613.128, 1910.5151, 1798.131, 1753.646, 1923.4254]
2025-09-16 11:47:30,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:47:30,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 36 seconds)
2025-09-16 11:49:03,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:49:13,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1785.75024 ± 171.546
2025-09-16 11:49:13,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1809.0986, 1687.6244, 1706.4585, 1479.1934, 1830.6301, 1967.5162, 2073.4802, 1738.3938, 1604.0355, 1961.0714]
2025-09-16 11:49:13,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:49:13,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 53 seconds)
2025-09-16 11:50:46,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:50:56,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1839.25427 ± 113.740
2025-09-16 11:50:56,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1864.2905, 1739.8014, 1964.9406, 1757.6907, 1867.6012, 1881.2072, 1690.5076, 2008.474, 1665.4067, 1952.624]
2025-09-16 11:50:56,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:50:56,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 9 seconds)
2025-09-16 11:52:30,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:52:39,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1794.72229 ± 139.167
2025-09-16 11:52:39,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1601.1005, 1986.0516, 1611.5508, 1818.8785, 2004.4165, 1829.3557, 1683.3998, 1875.1819, 1868.7585, 1668.5303]
2025-09-16 11:52:39,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:52:39,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 26 seconds)
2025-09-16 11:54:13,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:54:23,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1755.62598 ± 190.010
2025-09-16 11:54:23,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1808.958, 1725.3146, 1515.9169, 1710.4784, 1483.766, 1781.7008, 1544.1385, 1960.4703, 1966.5713, 2058.9446]
2025-09-16 11:54:23,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:54:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 43 seconds)
2025-09-16 11:55:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 11:56:06,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1849.40601 ± 108.814
2025-09-16 11:56:06,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [1978.999, 2014.9606, 1786.484, 1789.7766, 1742.9392, 1802.4156, 1804.3724, 1953.6685, 1673.888, 1946.5564]
2025-09-16 11:56:06,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:56:06,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1251 [DEBUG]: Training session finished
