2025-09-14 15:39:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_24
2025-09-14 15:39:53,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_24
2025-09-14 15:39:53,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x7f1ced983e00>}
2025-09-14 15:39:53,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 15:39:53,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 15:39:53,473 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 15:39:53,474 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 15:39:55,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 15:39:55,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 15:42:34,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:42:44,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -339.49619 ± 39.651
2025-09-14 15:42:44,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-359.87326), np.float32(-360.60342), np.float32(-414.18884), np.float32(-286.8564), np.float32(-321.87686), np.float32(-318.29785), np.float32(-274.0354), np.float32(-373.12646), np.float32(-330.59286), np.float32(-355.51086)]
2025-09-14 15:42:44,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:42:44,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-339.50) for latency 24
2025-09-14 15:42:44,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 38 minutes, 45 seconds)
2025-09-14 15:45:18,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:45:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -284.77353 ± 34.726
2025-09-14 15:45:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-316.00623), np.float32(-269.41595), np.float32(-326.96008), np.float32(-265.24088), np.float32(-302.6971), np.float32(-274.58597), np.float32(-254.94038), np.float32(-306.54547), np.float32(-321.20245), np.float32(-210.14078)]
2025-09-14 15:45:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:45:27,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-284.77) for latency 24
2025-09-14 15:45:27,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 31 minutes, 32 seconds)
2025-09-14 15:48:07,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:48:16,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -143.96021 ± 83.102
2025-09-14 15:48:16,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-251.86674), np.float32(-110.502304), np.float32(-280.92593), np.float32(-28.758148), np.float32(-60.92705), np.float32(-119.00411), np.float32(-184.51306), np.float32(-62.315502), np.float32(-229.72667), np.float32(-111.06245)]
2025-09-14 15:48:16,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:48:16,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-143.96) for latency 24
2025-09-14 15:48:16,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 30 minutes, 19 seconds)
2025-09-14 15:50:54,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:51:03,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 156.51109 ± 176.557
2025-09-14 15:51:03,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(275.14926), np.float32(332.7071), np.float32(265.71103), np.float32(353.64893), np.float32(49.69739), np.float32(-163.45743), np.float32(-151.0727), np.float32(221.08757), np.float32(211.9869), np.float32(169.65288)]
2025-09-14 15:51:03,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:51:03,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (156.51) for latency 24
2025-09-14 15:51:03,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 27 minutes, 25 seconds)
2025-09-14 15:53:36,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:53:45,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 879.31531 ± 82.391
2025-09-14 15:53:45,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(827.53845), np.float32(964.5703), np.float32(772.1869), np.float32(749.07513), np.float32(822.0235), np.float32(925.0354), np.float32(979.0426), np.float32(847.20874), np.float32(919.9382), np.float32(986.5349)]
2025-09-14 15:53:45,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:53:45,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (879.32) for latency 24
2025-09-14 15:53:45,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 22 minutes, 57 seconds)
2025-09-14 15:56:25,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:56:34,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1599.91260 ± 96.988
2025-09-14 15:56:34,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1489.671), np.float32(1720.5406), np.float32(1483.0809), np.float32(1654.2773), np.float32(1551.8477), np.float32(1760.6533), np.float32(1545.437), np.float32(1505.4524), np.float32(1702.4668), np.float32(1585.6981)]
2025-09-14 15:56:34,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:56:34,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1599.91) for latency 24
2025-09-14 15:56:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 20 minutes, 5 seconds)
2025-09-14 15:59:06,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 15:59:16,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1632.49487 ± 203.837
2025-09-14 15:59:16,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1547.8336), np.float32(1419.7931), np.float32(1383.18), np.float32(1720.9388), np.float32(2114.761), np.float32(1619.2915), np.float32(1423.3705), np.float32(1735.7242), np.float32(1706.8859), np.float32(1653.1711)]
2025-09-14 15:59:16,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:59:16,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1632.49) for latency 24
2025-09-14 15:59:16,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 16 minutes, 45 seconds)
2025-09-14 16:01:46,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:01:55,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1917.30151 ± 192.313
2025-09-14 16:01:55,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1896.9341), np.float32(2075.3508), np.float32(1642.6526), np.float32(1752.0416), np.float32(2281.8662), np.float32(2091.9849), np.float32(1891.3636), np.float32(1642.3949), np.float32(1948.0532), np.float32(1950.374)]
2025-09-14 16:01:55,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:01:55,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1917.30) for latency 24
2025-09-14 16:01:55,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 11 minutes, 5 seconds)
2025-09-14 16:04:35,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:04:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1986.58911 ± 119.514
2025-09-14 16:04:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1955.1632), np.float32(1945.1903), np.float32(1816.093), np.float32(2053.5713), np.float32(2026.0123), np.float32(1869.8435), np.float32(2280.858), np.float32(1949.699), np.float32(2026.1687), np.float32(1943.2932)]
2025-09-14 16:04:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:04:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1986.59) for latency 24
2025-09-14 16:04:44,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 9 minutes)
2025-09-14 16:07:19,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:07:29,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2048.47998 ± 154.660
2025-09-14 16:07:29,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2035.1595), np.float32(1987.6184), np.float32(2328.2783), np.float32(1837.0573), np.float32(2238.9578), np.float32(2112.8748), np.float32(2010.8544), np.float32(2095.2527), np.float32(2051.5835), np.float32(1787.1626)]
2025-09-14 16:07:29,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:07:29,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2048.48) for latency 24
2025-09-14 16:07:29,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 7 minutes, 2 seconds)
2025-09-14 16:10:04,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:10:13,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1971.27478 ± 145.031
2025-09-14 16:10:13,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1658.9725), np.float32(1842.626), np.float32(2031.641), np.float32(2165.435), np.float32(1967.2064), np.float32(1889.3322), np.float32(2157.9775), np.float32(1949.3562), np.float32(2084.4775), np.float32(1965.7253)]
2025-09-14 16:10:13,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:10:13,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 3 minutes, 2 seconds)
2025-09-14 16:12:52,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:13:01,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1976.97229 ± 142.202
2025-09-14 16:13:01,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1882.3519), np.float32(2036.3695), np.float32(1705.7319), np.float32(2054.8958), np.float32(2115.707), np.float32(2011.3511), np.float32(1999.3701), np.float32(2023.2748), np.float32(1761.8561), np.float32(2178.8154)]
2025-09-14 16:13:01,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:13:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 2 minutes, 8 seconds)
2025-09-14 16:15:33,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:15:42,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2123.93896 ± 160.338
2025-09-14 16:15:42,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2108.167), np.float32(2200.8984), np.float32(1915.9116), np.float32(2082.8215), np.float32(2196.9517), np.float32(2414.6248), np.float32(1931.0673), np.float32(2259.727), np.float32(1901.1515), np.float32(2228.0688)]
2025-09-14 16:15:42,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:15:42,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2123.94) for latency 24
2025-09-14 16:15:42,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 59 minutes, 53 seconds)
2025-09-14 16:18:20,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:18:29,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2143.77808 ± 164.177
2025-09-14 16:18:29,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2308.133), np.float32(1797.3507), np.float32(1908.4526), np.float32(2261.7195), np.float32(2182.576), np.float32(2231.8923), np.float32(2226.95), np.float32(2168.9966), np.float32(2306.9028), np.float32(2044.8081)]
2025-09-14 16:18:29,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:18:29,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2143.78) for latency 24
2025-09-14 16:18:29,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 56 minutes, 20 seconds)
2025-09-14 16:21:06,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:21:16,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2078.03223 ± 114.391
2025-09-14 16:21:16,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2034.7222), np.float32(2037.2245), np.float32(2199.5623), np.float32(2202.8154), np.float32(2087.9077), np.float32(1916.314), np.float32(2089.2905), np.float32(2191.4707), np.float32(1854.613), np.float32(2166.4028)]
2025-09-14 16:21:16,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:21:16,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 54 minutes, 16 seconds)
2025-09-14 16:23:49,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:23:58,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2225.40576 ± 135.373
2025-09-14 16:23:58,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2049.0205), np.float32(2103.4856), np.float32(2328.9595), np.float32(2387.0642), np.float32(2267.1965), np.float32(2325.2778), np.float32(2015.6765), np.float32(2232.6372), np.float32(2414.2422), np.float32(2130.4988)]
2025-09-14 16:23:58,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:23:58,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2225.41) for latency 24
2025-09-14 16:23:58,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 51 minutes, 4 seconds)
2025-09-14 16:26:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:26:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2124.82373 ± 149.967
2025-09-14 16:26:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1870.2115), np.float32(2204.4348), np.float32(1995.9918), np.float32(2047.2357), np.float32(2124.5933), np.float32(1993.5223), np.float32(2355.2578), np.float32(2073.35), np.float32(2257.5125), np.float32(2326.1284)]
2025-09-14 16:26:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:26:45,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 48 minutes, 3 seconds)
2025-09-14 16:29:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:29:32,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2048.57617 ± 106.877
2025-09-14 16:29:32,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2108.3938), np.float32(1960.9099), np.float32(2218.4321), np.float32(2017.9386), np.float32(1887.3351), np.float32(2146.5698), np.float32(2196.046), np.float32(1953.4005), np.float32(1968.9991), np.float32(2027.7358)]
2025-09-14 16:29:32,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:29:32,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 46 minutes, 44 seconds)
2025-09-14 16:32:04,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:32:13,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2287.82275 ± 126.651
2025-09-14 16:32:13,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2381.922), np.float32(2211.5974), np.float32(2380.7078), np.float32(2148.7375), np.float32(2167.9968), np.float32(2308.0845), np.float32(2248.059), np.float32(2552.5435), np.float32(2354.6777), np.float32(2123.9006)]
2025-09-14 16:32:13,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:32:13,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2287.82) for latency 24
2025-09-14 16:32:13,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 42 minutes, 32 seconds)
2025-09-14 16:34:52,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:35:01,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2400.25317 ± 146.975
2025-09-14 16:35:01,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2183.64), np.float32(2295.4724), np.float32(2331.4849), np.float32(2538.3442), np.float32(2438.161), np.float32(2315.783), np.float32(2246.5803), np.float32(2502.838), np.float32(2698.5115), np.float32(2451.7166)]
2025-09-14 16:35:01,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:35:01,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2400.25) for latency 24
2025-09-14 16:35:01,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 40 minutes)
2025-09-14 16:37:40,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:37:49,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2369.28955 ± 164.889
2025-09-14 16:37:49,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2302.6365), np.float32(2104.0935), np.float32(2354.379), np.float32(2550.4233), np.float32(2569.022), np.float32(2343.6545), np.float32(2229.4312), np.float32(2165.0542), np.float32(2591.5862), np.float32(2482.614)]
2025-09-14 16:37:49,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:37:49,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 38 minutes, 49 seconds)
2025-09-14 16:40:23,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:40:33,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2362.48389 ± 128.838
2025-09-14 16:40:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2335.3462), np.float32(2553.8984), np.float32(2320.9033), np.float32(2176.0344), np.float32(2293.746), np.float32(2341.4443), np.float32(2169.5444), np.float32(2558.076), np.float32(2431.0852), np.float32(2444.7605)]
2025-09-14 16:40:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:40:33,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 35 minutes, 9 seconds)
2025-09-14 16:43:13,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:43:22,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2643.14893 ± 127.606
2025-09-14 16:43:22,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2712.6948), np.float32(2623.0579), np.float32(2561.274), np.float32(2407.2373), np.float32(2520.7942), np.float32(2651.796), np.float32(2916.1475), np.float32(2675.5117), np.float32(2652.568), np.float32(2710.4102)]
2025-09-14 16:43:22,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:43:22,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2643.15) for latency 24
2025-09-14 16:43:22,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 33 minutes, 6 seconds)
2025-09-14 16:46:02,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:46:12,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2648.02319 ± 107.084
2025-09-14 16:46:12,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2773.575), np.float32(2566.348), np.float32(2586.4604), np.float32(2688.0405), np.float32(2610.6875), np.float32(2861.4326), np.float32(2684.2827), np.float32(2575.6719), np.float32(2667.331), np.float32(2466.4043)]
2025-09-14 16:46:12,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:46:12,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2648.02) for latency 24
2025-09-14 16:46:12,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 32 minutes, 27 seconds)
2025-09-14 16:48:43,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:48:52,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2539.73706 ± 207.209
2025-09-14 16:48:52,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2181.1323), np.float32(2601.8752), np.float32(2470.543), np.float32(2578.9348), np.float32(2970.9622), np.float32(2468.6624), np.float32(2292.9338), np.float32(2505.5015), np.float32(2722.1099), np.float32(2604.714)]
2025-09-14 16:48:52,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:48:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 27 minutes, 49 seconds)
2025-09-14 16:51:29,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:51:38,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2797.93872 ± 83.851
2025-09-14 16:51:38,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2817.8083), np.float32(2734.757), np.float32(2909.3782), np.float32(2963.3564), np.float32(2769.2515), np.float32(2650.8), np.float32(2744.0015), np.float32(2774.3154), np.float32(2806.4863), np.float32(2809.233)]
2025-09-14 16:51:38,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:51:38,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2797.94) for latency 24
2025-09-14 16:51:38,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 24 minutes, 23 seconds)
2025-09-14 16:54:16,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:54:25,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2771.11865 ± 95.119
2025-09-14 16:54:25,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2781.7307), np.float32(2660.212), np.float32(2715.6514), np.float32(2680.7563), np.float32(2855.2668), np.float32(2705.7695), np.float32(2863.2153), np.float32(2661.2422), np.float32(2943.0547), np.float32(2844.2896)]
2025-09-14 16:54:25,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:54:25,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 22 minutes, 30 seconds)
2025-09-14 16:57:00,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:57:09,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2716.09692 ± 132.073
2025-09-14 16:57:09,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2683.0806), np.float32(2716.7402), np.float32(2941.7625), np.float32(2634.7947), np.float32(2672.7874), np.float32(2666.4236), np.float32(2890.855), np.float32(2791.869), np.float32(2720.7175), np.float32(2441.9397)]
2025-09-14 16:57:09,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:57:09,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 18 minutes, 25 seconds)
2025-09-14 16:59:46,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:59:55,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2793.02148 ± 146.193
2025-09-14 16:59:55,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2751.265), np.float32(2975.2231), np.float32(2892.4316), np.float32(2526.9524), np.float32(2667.5488), np.float32(2869.8718), np.float32(2595.4053), np.float32(2862.9678), np.float32(2816.7866), np.float32(2971.7625)]
2025-09-14 16:59:55,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:59:55,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 14 minutes, 54 seconds)
2025-09-14 17:02:31,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:02:40,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2719.30786 ± 112.927
2025-09-14 17:02:40,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2804.051), np.float32(2583.606), np.float32(2798.2244), np.float32(2668.6858), np.float32(2693.1438), np.float32(2881.7336), np.float32(2905.0032), np.float32(2618.7983), np.float32(2590.506), np.float32(2649.3254)]
2025-09-14 17:02:40,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:02:40,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 13 minutes, 18 seconds)
2025-09-14 17:05:11,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:05:21,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2731.93579 ± 85.541
2025-09-14 17:05:21,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2595.225), np.float32(2714.7817), np.float32(2677.191), np.float32(2705.4062), np.float32(2785.758), np.float32(2639.5095), np.float32(2677.8118), np.float32(2827.6196), np.float32(2845.9229), np.float32(2850.1316)]
2025-09-14 17:05:21,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:05:21,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 9 minutes, 14 seconds)
2025-09-14 17:07:59,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:08:08,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2726.18896 ± 120.268
2025-09-14 17:08:08,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2907.2417), np.float32(2711.884), np.float32(2528.046), np.float32(2749.6665), np.float32(2642.1848), np.float32(2727.0173), np.float32(2733.1062), np.float32(2686.7944), np.float32(2621.3076), np.float32(2954.641)]
2025-09-14 17:08:08,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:08:08,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 6 minutes, 35 seconds)
2025-09-14 17:10:46,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:10:56,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2829.77100 ± 118.476
2025-09-14 17:10:56,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2797.826), np.float32(2772.2646), np.float32(2749.871), np.float32(2864.966), np.float32(2946.7024), np.float32(3040.16), np.float32(2838.233), np.float32(2902.7568), np.float32(2809.4587), np.float32(2575.4705)]
2025-09-14 17:10:56,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:10:56,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2829.77) for latency 24
2025-09-14 17:10:56,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 4 minutes, 37 seconds)
2025-09-14 17:14:12,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:14:22,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2799.45850 ± 72.140
2025-09-14 17:14:22,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2710.961), np.float32(2816.495), np.float32(2812.6025), np.float32(2792.3992), np.float32(2932.985), np.float32(2693.2627), np.float32(2886.7896), np.float32(2749.669), np.float32(2752.3616), np.float32(2847.0583)]
2025-09-14 17:14:22,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:14:22,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 10 minutes, 43 seconds)
2025-09-14 17:17:46,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:17:56,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2799.31982 ± 95.973
2025-09-14 17:17:56,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2742.3474), np.float32(2818.3782), np.float32(2755.4836), np.float32(2740.7585), np.float32(2872.97), np.float32(2718.328), np.float32(2952.7188), np.float32(2667.034), np.float32(2757.406), np.float32(2967.774)]
2025-09-14 17:17:56,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:17:56,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 18 minutes, 24 seconds)
2025-09-14 17:21:17,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:21:26,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2513.33643 ± 860.583
2025-09-14 17:21:26,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2788.5798), np.float32(2938.002), np.float32(2787.1223), np.float32(2770.049), np.float32(2634.9885), np.float32(2806.2856), np.float32(2773.4966), np.float32(2865.0103), np.float32(2828.9436), np.float32(-59.112667)]
2025-09-14 17:21:26,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:21:26,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 25 minutes, 57 seconds)
2025-09-14 17:24:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:24:56,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2943.07178 ± 117.282
2025-09-14 17:24:56,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2900.4595), np.float32(2914.2656), np.float32(3076.7217), np.float32(3113.915), np.float32(3019.805), np.float32(2905.9219), np.float32(2806.0107), np.float32(2711.253), np.float32(2949.7551), np.float32(3032.6113)]
2025-09-14 17:24:56,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:24:56,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2943.07) for latency 24
2025-09-14 17:24:57,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 31 minutes, 43 seconds)
2025-09-14 17:28:14,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:28:23,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2858.37769 ± 134.941
2025-09-14 17:28:23,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2906.1816), np.float32(2885.7944), np.float32(2754.3467), np.float32(2702.1846), np.float32(3108.9348), np.float32(2925.3818), np.float32(2644.2583), np.float32(2747.8433), np.float32(2959.5488), np.float32(2949.303)]
2025-09-14 17:28:23,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:28:23,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 36 minutes, 29 seconds)
2025-09-14 17:31:42,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:31:52,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2799.06348 ± 81.214
2025-09-14 17:31:52,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2810.6255), np.float32(2933.9133), np.float32(2665.2107), np.float32(2682.8691), np.float32(2787.8455), np.float32(2864.5889), np.float32(2897.6929), np.float32(2779.3726), np.float32(2809.2334), np.float32(2759.284)]
2025-09-14 17:31:52,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:31:52,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 33 minutes, 27 seconds)
2025-09-14 17:35:08,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:35:18,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2802.86548 ± 107.846
2025-09-14 17:35:18,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2803.0337), np.float32(2805.5334), np.float32(2993.0007), np.float32(2817.1687), np.float32(2552.28), np.float32(2853.3298), np.float32(2713.5972), np.float32(2879.2114), np.float32(2825.6218), np.float32(2785.8796)]
2025-09-14 17:35:18,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:35:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 28 minutes, 17 seconds)
2025-09-14 17:38:33,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:38:43,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2868.15479 ± 147.815
2025-09-14 17:38:43,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2633.7634), np.float32(2992.1804), np.float32(3154.2417), np.float32(2916.214), np.float32(2861.5444), np.float32(2803.4624), np.float32(2949.122), np.float32(2804.5383), np.float32(2915.08), np.float32(2651.4004)]
2025-09-14 17:38:43,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:38:43,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 23 minutes, 52 seconds)
2025-09-14 17:42:00,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:42:10,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2948.17578 ± 113.221
2025-09-14 17:42:10,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3083.5884), np.float32(2827.9094), np.float32(2864.8352), np.float32(3061.9553), np.float32(3040.4785), np.float32(2952.6587), np.float32(2899.1467), np.float32(2898.503), np.float32(3101.1738), np.float32(2751.5085)]
2025-09-14 17:42:10,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:42:10,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2948.18) for latency 24
2025-09-14 17:42:10,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 19 minutes, 45 seconds)
2025-09-14 17:45:14,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:45:24,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2835.92896 ± 103.161
2025-09-14 17:45:24,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3040.213), np.float32(2895.0977), np.float32(2876.694), np.float32(2735.5193), np.float32(2880.028), np.float32(2790.5535), np.float32(2728.1382), np.float32(2936.16), np.float32(2689.6055), np.float32(2787.2817)]
2025-09-14 17:45:24,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:45:24,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 13 minutes, 51 seconds)
2025-09-14 17:48:25,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:48:34,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2852.01099 ± 104.896
2025-09-14 17:48:34,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3115.5366), np.float32(2872.9902), np.float32(2812.93), np.float32(2686.7773), np.float32(2833.607), np.float32(2761.04), np.float32(2842.7898), np.float32(2836.6558), np.float32(2857.2676), np.float32(2900.5156)]
2025-09-14 17:48:34,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:48:34,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 7 minutes, 5 seconds)
2025-09-14 17:51:31,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:51:40,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2850.58765 ± 72.801
2025-09-14 17:51:40,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2863.74), np.float32(2765.046), np.float32(2747.5686), np.float32(2959.0752), np.float32(2907.7866), np.float32(2804.2778), np.float32(2770.3147), np.float32(2899.4675), np.float32(2844.018), np.float32(2944.5818)]
2025-09-14 17:51:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:51:40,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 8 seconds)
2025-09-14 17:54:42,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:54:51,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2889.28247 ± 94.657
2025-09-14 17:54:51,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2948.0605), np.float32(2830.5103), np.float32(2701.5947), np.float32(2857.9988), np.float32(2980.3271), np.float32(2931.4348), np.float32(2956.0122), np.float32(2786.4453), np.float32(3033.781), np.float32(2866.6594)]
2025-09-14 17:54:51,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:54:51,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 54 minutes, 19 seconds)
2025-09-14 17:58:05,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:58:14,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2956.16357 ± 144.101
2025-09-14 17:58:14,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3049.4958), np.float32(3012.6226), np.float32(2727.401), np.float32(2797.594), np.float32(3003.6853), np.float32(2882.1763), np.float32(3196.326), np.float32(2776.718), np.float32(3058.9756), np.float32(3056.6387)]
2025-09-14 17:58:14,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:58:14,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2956.16) for latency 24
2025-09-14 17:58:14,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 50 minutes, 26 seconds)
2025-09-14 18:01:31,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:01:40,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2850.30713 ± 99.205
2025-09-14 18:01:40,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2974.5862), np.float32(2710.479), np.float32(2768.9536), np.float32(2724.33), np.float32(2845.0078), np.float32(2858.5151), np.float32(3031.5173), np.float32(2793.1753), np.float32(2889.981), np.float32(2906.527)]
2025-09-14 18:01:40,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:01:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 49 minutes, 19 seconds)
2025-09-14 18:05:00,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:05:09,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2885.28882 ± 108.435
2025-09-14 18:05:09,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2979.5552), np.float32(2899.0793), np.float32(2973.3875), np.float32(2846.33), np.float32(3033.8528), np.float32(2777.0974), np.float32(2696.5845), np.float32(2992.876), np.float32(2751.2834), np.float32(2902.8442)]
2025-09-14 18:05:09,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:05:09,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 49 minutes, 10 seconds)
2025-09-14 18:08:30,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:08:39,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2920.69653 ± 127.097
2025-09-14 18:08:39,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2713.4036), np.float32(2857.8364), np.float32(2792.8103), np.float32(2860.6868), np.float32(2921.1577), np.float32(3067.2913), np.float32(3074.151), np.float32(2826.459), np.float32(2979.3406), np.float32(3113.828)]
2025-09-14 18:08:39,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:08:39,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 49 minutes, 48 seconds)
2025-09-14 18:11:54,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:12:03,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2784.20996 ± 187.856
2025-09-14 18:12:03,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2605.548), np.float32(2842.404), np.float32(2323.426), np.float32(2781.4502), np.float32(3026.3293), np.float32(2815.4045), np.float32(2946.951), np.float32(2916.5432), np.float32(2764.2012), np.float32(2819.8416)]
2025-09-14 18:12:03,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:12:03,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 48 minutes, 33 seconds)
2025-09-14 18:15:15,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:15:25,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2941.51025 ± 99.076
2025-09-14 18:15:25,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2951.8552), np.float32(3122.1226), np.float32(2884.2668), np.float32(3022.8413), np.float32(2970.1077), np.float32(2975.508), np.float32(3027.211), np.float32(2828.348), np.float32(2783.8381), np.float32(2849.0066)]
2025-09-14 18:15:25,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:15:25,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 44 minutes, 48 seconds)
2025-09-14 18:18:31,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:18:41,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2912.24976 ± 105.084
2025-09-14 18:18:41,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2919.5288), np.float32(2912.2522), np.float32(3130.5151), np.float32(2904.101), np.float32(2886.355), np.float32(2795.3716), np.float32(3025.8818), np.float32(2972.656), np.float32(2767.1921), np.float32(2808.6445)]
2025-09-14 18:18:41,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:18:41,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 39 minutes, 50 seconds)
2025-09-14 18:21:50,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:22:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2850.45630 ± 241.211
2025-09-14 18:22:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2882.7507), np.float32(2800.167), np.float32(2705.7249), np.float32(2786.991), np.float32(2863.44), np.float32(2965.0605), np.float32(2966.4749), np.float32(2252.6492), np.float32(3121.7058), np.float32(3159.599)]
2025-09-14 18:22:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:22:00,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 34 minutes, 56 seconds)
2025-09-14 18:25:12,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:25:21,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2897.50488 ± 122.457
2025-09-14 18:25:21,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2916.3225), np.float32(2801.587), np.float32(3031.4211), np.float32(2925.9294), np.float32(3005.3616), np.float32(2828.5535), np.float32(2683.1016), np.float32(2735.1013), np.float32(3001.3499), np.float32(3046.3198)]
2025-09-14 18:25:21,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:25:21,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 30 minutes, 22 seconds)
2025-09-14 18:28:32,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:28:41,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2935.09253 ± 72.602
2025-09-14 18:28:41,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2851.1199), np.float32(2925.4128), np.float32(2953.6614), np.float32(3063.9675), np.float32(2778.08), np.float32(2965.9453), np.float32(2969.733), np.float32(2947.4463), np.float32(2921.9714), np.float32(2973.588)]
2025-09-14 18:28:41,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:28:41,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 26 minutes, 24 seconds)
2025-09-14 18:31:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:31:46,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2955.17896 ± 137.861
2025-09-14 18:31:46,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2773.3596), np.float32(2838.186), np.float32(2898.0776), np.float32(3119.4487), np.float32(2894.446), np.float32(3106.5688), np.float32(3095.3118), np.float32(2983.069), np.float32(3100.2593), np.float32(2743.0603)]
2025-09-14 18:31:46,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:31:46,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 20 minutes, 44 seconds)
2025-09-14 18:34:50,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:35:00,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2880.19849 ± 99.075
2025-09-14 18:35:00,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3042.1206), np.float32(2866.161), np.float32(2748.1978), np.float32(2767.9224), np.float32(2956.0803), np.float32(3034.7664), np.float32(2862.5042), np.float32(2827.52), np.float32(2789.4182), np.float32(2907.2937)]
2025-09-14 18:35:00,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:35:00,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2025-09-14 18:38:01,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:38:10,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2886.79810 ± 152.478
2025-09-14 18:38:10,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2729.2659), np.float32(2976.4675), np.float32(2831.305), np.float32(2699.843), np.float32(2821.3896), np.float32(2829.557), np.float32(3007.1035), np.float32(3249.7632), np.float32(2804.326), np.float32(2918.9614)]
2025-09-14 18:38:10,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:38:10,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 12 minutes, 34 seconds)
2025-09-14 18:41:12,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:41:22,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2888.80835 ± 129.800
2025-09-14 18:41:22,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2738.3218), np.float32(2954.9465), np.float32(3163.101), np.float32(2753.5088), np.float32(2921.2996), np.float32(3030.963), np.float32(2851.7466), np.float32(2732.201), np.float32(2878.701), np.float32(2863.2932)]
2025-09-14 18:41:22,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:41:22,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 8 minutes, 2 seconds)
2025-09-14 18:44:27,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:44:36,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2921.50220 ± 175.652
2025-09-14 18:44:36,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3052.1309), np.float32(2927.9146), np.float32(3142.8684), np.float32(2791.192), np.float32(3005.6873), np.float32(2662.9084), np.float32(3104.3489), np.float32(2585.998), np.float32(2988.245), np.float32(2953.7288)]
2025-09-14 18:44:36,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:44:36,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 4 minutes, 5 seconds)
2025-09-14 18:47:31,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:47:40,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2877.17065 ± 109.140
2025-09-14 18:47:40,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2825.1182), np.float32(2949.1528), np.float32(3036.3792), np.float32(2925.1284), np.float32(2954.887), np.float32(2710.4668), np.float32(3000.7798), np.float32(2775.8225), np.float32(2872.274), np.float32(2721.6987)]
2025-09-14 18:47:40,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:47:40,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 46 seconds)
2025-09-14 18:50:24,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:50:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2952.84326 ± 102.365
2025-09-14 18:50:33,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2943.0542), np.float32(3158.4485), np.float32(2930.5818), np.float32(2834.9282), np.float32(3011.5684), np.float32(2847.528), np.float32(2900.8608), np.float32(3095.5598), np.float32(2955.0464), np.float32(2850.8557)]
2025-09-14 18:50:33,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:50:33,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 55 minutes, 5 seconds)
2025-09-14 18:53:23,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:53:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2861.99854 ± 124.739
2025-09-14 18:53:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2932.4033), np.float32(2720.1885), np.float32(3006.8386), np.float32(2638.4954), np.float32(2989.3257), np.float32(2697.9458), np.float32(2889.2524), np.float32(2965.939), np.float32(2930.9146), np.float32(2848.6821)]
2025-09-14 18:53:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:53:32,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 50 minutes, 38 seconds)
2025-09-14 18:56:16,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:56:25,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2956.86401 ± 113.110
2025-09-14 18:56:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2879.7173), np.float32(3012.5867), np.float32(2984.8135), np.float32(2950.3745), np.float32(3152.8115), np.float32(3022.4678), np.float32(2738.8633), np.float32(3074.1685), np.float32(2852.6208), np.float32(2900.2168)]
2025-09-14 18:56:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:56:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2956.86) for latency 24
2025-09-14 18:56:25,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 45 minutes, 25 seconds)
2025-09-14 18:59:11,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:59:20,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2901.22607 ± 113.363
2025-09-14 18:59:20,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3106.415), np.float32(2833.735), np.float32(2875.49), np.float32(2974.3684), np.float32(2771.8066), np.float32(2948.7227), np.float32(2699.9434), np.float32(3024.163), np.float32(2910.014), np.float32(2867.6023)]
2025-09-14 18:59:20,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:59:20,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 40 minutes, 12 seconds)
2025-09-14 19:02:07,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:02:15,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2936.87427 ± 124.167
2025-09-14 19:02:15,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2802.6628), np.float32(2675.1929), np.float32(3044.5862), np.float32(3141.8984), np.float32(3001.9785), np.float32(2933.4119), np.float32(2991.1033), np.float32(2867.366), np.float32(2981.2463), np.float32(2929.2979)]
2025-09-14 19:02:15,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:02:15,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 36 minutes, 18 seconds)
2025-09-14 19:05:04,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:05:13,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2887.55591 ± 185.267
2025-09-14 19:05:13,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2713.1343), np.float32(2890.872), np.float32(2848.6375), np.float32(3203.3738), np.float32(2990.582), np.float32(2999.0244), np.float32(2665.9731), np.float32(2851.7422), np.float32(3117.0225), np.float32(2595.196)]
2025-09-14 19:05:13,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:05:13,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 33 minutes, 51 seconds)
2025-09-14 19:08:00,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:08:09,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2949.83862 ± 102.854
2025-09-14 19:08:09,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2872.1284), np.float32(3035.93), np.float32(2716.0405), np.float32(2955.376), np.float32(2929.9666), np.float32(3045.961), np.float32(2954.6538), np.float32(3093.1711), np.float32(2886.6125), np.float32(3008.5479)]
2025-09-14 19:08:09,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:08:09,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 30 minutes, 41 seconds)
2025-09-14 19:11:09,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:11:19,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2920.76782 ± 99.899
2025-09-14 19:11:19,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2913.4067), np.float32(2845.0222), np.float32(3141.797), np.float32(3012.6824), np.float32(2953.1824), np.float32(2864.886), np.float32(2755.4514), np.float32(2962.332), np.float32(2884.9353), np.float32(2873.9817)]
2025-09-14 19:11:19,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:11:19,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 29 minutes, 19 seconds)
2025-09-14 19:14:30,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:14:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2955.53516 ± 162.198
2025-09-14 19:14:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2979.0908), np.float32(3207.1501), np.float32(2982.7642), np.float32(3104.7615), np.float32(2616.998), np.float32(2781.481), np.float32(3039.283), np.float32(2996.7097), np.float32(2822.1897), np.float32(3024.9238)]
2025-09-14 19:14:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:14:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 28 minutes, 49 seconds)
2025-09-14 19:17:49,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:17:58,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2875.53467 ± 112.382
2025-09-14 19:17:58,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2980.6816), np.float32(3016.5217), np.float32(2842.9822), np.float32(2944.0984), np.float32(2779.9275), np.float32(2764.2004), np.float32(2837.2603), np.float32(3030.1855), np.float32(2890.0474), np.float32(2669.4424)]
2025-09-14 19:17:58,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:17:58,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 27 minutes, 58 seconds)
2025-09-14 19:21:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:21:18,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2878.90649 ± 124.573
2025-09-14 19:21:18,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2975.6023), np.float32(2792.8828), np.float32(2810.111), np.float32(2681.163), np.float32(2887.6758), np.float32(2724.125), np.float32(2861.0315), np.float32(3068.4448), np.float32(2931.4834), np.float32(3056.5447)]
2025-09-14 19:21:18,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:21:18,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 26 minutes, 53 seconds)
2025-09-14 19:24:24,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:24:33,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2968.10083 ± 169.363
2025-09-14 19:24:33,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2807.6653), np.float32(3035.036), np.float32(2752.0605), np.float32(3389.7507), np.float32(2904.2878), np.float32(2941.5647), np.float32(2930.0454), np.float32(2855.6853), np.float32(2982.393), np.float32(3082.52)]
2025-09-14 19:24:33,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:24:33,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2968.10) for latency 24
2025-09-14 19:24:33,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 25 minutes, 16 seconds)
2025-09-14 19:27:38,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:27:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2950.31616 ± 148.249
2025-09-14 19:27:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2950.508), np.float32(2958.0586), np.float32(3076.6978), np.float32(2977.1301), np.float32(2877.8845), np.float32(3003.9636), np.float32(3025.9136), np.float32(3161.5974), np.float32(2573.792), np.float32(2897.6152)]
2025-09-14 19:27:48,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:27:48,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 22 minutes, 26 seconds)
2025-09-14 19:30:54,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:31:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2879.21729 ± 164.922
2025-09-14 19:31:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3177.1716), np.float32(2609.53), np.float32(2871.5312), np.float32(2939.2556), np.float32(2864.8562), np.float32(2762.2883), np.float32(2788.436), np.float32(2968.4167), np.float32(3099.4841), np.float32(2711.204)]
2025-09-14 19:31:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:31:03,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 18 minutes, 43 seconds)
2025-09-14 19:34:08,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:34:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2989.56323 ± 151.094
2025-09-14 19:34:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2916.2712), np.float32(2868.9785), np.float32(2835.5503), np.float32(2970.2458), np.float32(2853.563), np.float32(3159.2227), np.float32(2836.5713), np.float32(3234.378), np.float32(2999.8557), np.float32(3220.9968)]
2025-09-14 19:34:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:34:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2989.56) for latency 24
2025-09-14 19:34:18,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 15 minutes, 6 seconds)
2025-09-14 19:37:24,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:37:33,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2857.49561 ± 78.330
2025-09-14 19:37:33,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2868.968), np.float32(2867.656), np.float32(3057.092), np.float32(2760.1565), np.float32(2802.1196), np.float32(2849.6895), np.float32(2914.7964), np.float32(2840.2866), np.float32(2803.9485), np.float32(2810.2417)]
2025-09-14 19:37:33,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:37:33,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 11 minutes, 28 seconds)
2025-09-14 19:40:41,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:40:50,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2879.36133 ± 129.353
2025-09-14 19:40:50,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2977.8877), np.float32(2849.088), np.float32(3078.155), np.float32(3058.5823), np.float32(2824.5415), np.float32(2901.8894), np.float32(2788.9888), np.float32(2629.202), np.float32(2778.988), np.float32(2906.292)]
2025-09-14 19:40:50,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:40:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 8 minutes, 20 seconds)
2025-09-14 19:43:58,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:44:08,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2851.71387 ± 89.401
2025-09-14 19:44:08,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2770.6064), np.float32(3061.4902), np.float32(2753.8938), np.float32(2779.416), np.float32(2941.3848), np.float32(2817.4697), np.float32(2872.5356), np.float32(2856.092), np.float32(2877.465), np.float32(2786.7854)]
2025-09-14 19:44:08,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:44:08,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 5 minutes, 18 seconds)
2025-09-14 19:47:11,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:47:21,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2901.24268 ± 129.153
2025-09-14 19:47:21,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3030.175), np.float32(2807.124), np.float32(3117.1882), np.float32(2774.3433), np.float32(2840.7573), np.float32(2735.3154), np.float32(3022.8909), np.float32(2966.764), np.float32(2747.7444), np.float32(2970.1245)]
2025-09-14 19:47:21,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:47:21,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 1 minute, 55 seconds)
2025-09-14 19:50:29,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:50:39,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2872.12280 ± 117.927
2025-09-14 19:50:39,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2889.794), np.float32(2949.8823), np.float32(2725.3552), np.float32(2789.264), np.float32(2879.7205), np.float32(3158.608), np.float32(2735.9563), np.float32(2837.7593), np.float32(2912.032), np.float32(2842.8567)]
2025-09-14 19:50:39,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:50:39,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 58 minutes, 50 seconds)
2025-09-14 19:53:44,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:53:54,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2973.53125 ± 86.274
2025-09-14 19:53:54,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2838.7415), np.float32(2936.458), np.float32(3041.9346), np.float32(2933.783), np.float32(2999.826), np.float32(3103.583), np.float32(3010.3125), np.float32(2999.1636), np.float32(3050.129), np.float32(2821.3833)]
2025-09-14 19:53:54,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:53:54,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 55 minutes, 34 seconds)
2025-09-14 19:57:04,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:57:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2888.54590 ± 164.706
2025-09-14 19:57:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2634.4712), np.float32(3092.3855), np.float32(2997.741), np.float32(2919.0134), np.float32(2806.7864), np.float32(3005.2502), np.float32(2661.6433), np.float32(2986.1458), np.float32(2702.601), np.float32(3079.4202)]
2025-09-14 19:57:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:57:13,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 52 minutes, 27 seconds)
2025-09-14 20:00:26,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:00:36,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2969.70557 ± 92.033
2025-09-14 20:00:36,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2900.5767), np.float32(2924.6865), np.float32(3005.3271), np.float32(3009.7185), np.float32(2939.5916), np.float32(2853.2043), np.float32(2991.9067), np.float32(2859.5469), np.float32(3033.348), np.float32(3179.1501)]
2025-09-14 20:00:36,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:00:36,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 49 minutes, 24 seconds)
2025-09-14 20:03:41,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:03:51,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2976.38818 ± 191.018
2025-09-14 20:03:51,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3175.372), np.float32(2989.2883), np.float32(3143.2686), np.float32(3174.0522), np.float32(2748.1018), np.float32(2806.2615), np.float32(2760.3567), np.float32(2762.0708), np.float32(3264.8862), np.float32(2940.2224)]
2025-09-14 20:03:51,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:03:51,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 46 minutes, 12 seconds)
2025-09-14 20:06:57,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:07:07,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2922.04175 ± 179.397
2025-09-14 20:07:07,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2673.8682), np.float32(2841.159), np.float32(2626.9421), np.float32(3003.5435), np.float32(3236.8835), np.float32(2986.4321), np.float32(3088.8096), np.float32(2920.5874), np.float32(2800.9128), np.float32(3041.2803)]
2025-09-14 20:07:07,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:07:07,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 42 minutes, 50 seconds)
2025-09-14 20:10:12,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:10:21,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2956.06128 ± 104.275
2025-09-14 20:10:21,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2864.6707), np.float32(2999.2344), np.float32(2967.452), np.float32(2874.321), np.float32(2931.016), np.float32(2868.0857), np.float32(3240.3179), np.float32(2935.5332), np.float32(2968.2554), np.float32(2911.7263)]
2025-09-14 20:10:21,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:10:21,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 39 minutes, 29 seconds)
2025-09-14 20:13:34,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:13:43,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2932.31958 ± 142.169
2025-09-14 20:13:43,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3043.9092), np.float32(2963.5247), np.float32(3011.8328), np.float32(2607.1443), np.float32(3071.3518), np.float32(2776.992), np.float32(3084.4902), np.float32(2948.5935), np.float32(2842.3323), np.float32(2973.025)]
2025-09-14 20:13:43,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:13:43,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 36 minutes, 17 seconds)
2025-09-14 20:16:49,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:16:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3010.30957 ± 155.576
2025-09-14 20:16:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2926.293), np.float32(2836.4785), np.float32(3071.9277), np.float32(3186.8103), np.float32(3139.081), np.float32(2816.268), np.float32(3301.822), np.float32(2980.9792), np.float32(2837.9248), np.float32(3005.5105)]
2025-09-14 20:16:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:16:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3010.31) for latency 24
2025-09-14 20:16:59,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 45 seconds)
2025-09-14 20:20:07,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:20:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2974.17529 ± 140.781
2025-09-14 20:20:17,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2741.7913), np.float32(2916.832), np.float32(3049.4863), np.float32(2787.7964), np.float32(2990.3843), np.float32(2856.5867), np.float32(3098.2852), np.float32(3001.717), np.float32(3083.0742), np.float32(3215.801)]
2025-09-14 20:20:17,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:20:17,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 35 seconds)
2025-09-14 20:23:24,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:23:34,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2957.38232 ± 94.555
2025-09-14 20:23:34,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2880.6877), np.float32(3029.9749), np.float32(3029.131), np.float32(2907.86), np.float32(2840.744), np.float32(3148.1057), np.float32(2983.6978), np.float32(3016.3655), np.float32(2887.9717), np.float32(2849.2837)]
2025-09-14 20:23:34,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:23:34,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 18 seconds)
2025-09-14 20:26:39,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:26:48,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2952.38159 ± 116.313
2025-09-14 20:26:48,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2889.9023), np.float32(3035.1829), np.float32(2890.3628), np.float32(2933.1428), np.float32(2831.2085), np.float32(2925.6855), np.float32(3248.3762), np.float32(2891.1702), np.float32(2858.0945), np.float32(3020.6907)]
2025-09-14 20:26:48,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:26:48,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 1 second)
2025-09-14 20:30:01,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:30:11,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3039.72021 ± 132.179
2025-09-14 20:30:11,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3042.1577), np.float32(2980.5996), np.float32(2974.8733), np.float32(3246.8647), np.float32(3163.4512), np.float32(2815.4136), np.float32(2850.721), np.float32(3050.5825), np.float32(3091.5916), np.float32(3180.9482)]
2025-09-14 20:30:11,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:30:11,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3039.72) for latency 24
2025-09-14 20:30:11,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 45 seconds)
2025-09-14 20:33:06,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:33:15,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3001.99683 ± 104.733
2025-09-14 20:33:15,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2829.5793), np.float32(3159.1655), np.float32(2870.0742), np.float32(3122.7537), np.float32(3028.1912), np.float32(2920.146), np.float32(3057.6677), np.float32(3083.5637), np.float32(3022.4465), np.float32(2926.38)]
2025-09-14 20:33:15,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:33:15,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 16 seconds)
2025-09-14 20:36:21,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:36:31,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3046.87500 ± 115.263
2025-09-14 20:36:31,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3203.2192), np.float32(3103.4604), np.float32(3142.985), np.float32(2925.7278), np.float32(2920.0247), np.float32(3168.4414), np.float32(2980.332), np.float32(2959.5286), np.float32(3170.2803), np.float32(2894.751)]
2025-09-14 20:36:31,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:36:31,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3046.88) for latency 24
2025-09-14 20:36:31,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 59 seconds)
2025-09-14 20:39:37,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:39:47,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3008.01855 ± 128.170
2025-09-14 20:39:47,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3185.0916), np.float32(3187.326), np.float32(3000.8293), np.float32(3078.637), np.float32(3026.4402), np.float32(2904.686), np.float32(3099.6301), np.float32(2933.4504), np.float32(2898.4944), np.float32(2765.601)]
2025-09-14 20:39:47,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:39:47,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 44 seconds)
2025-09-14 20:43:01,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:43:10,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3071.15796 ± 104.556
2025-09-14 20:43:10,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2917.1824), np.float32(3287.5586), np.float32(2976.254), np.float32(3034.799), np.float32(3087.4753), np.float32(3116.871), np.float32(3097.6729), np.float32(3193.5132), np.float32(2987.2412), np.float32(3013.0105)]
2025-09-14 20:43:10,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:43:10,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3071.16) for latency 24
2025-09-14 20:43:10,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 32 seconds)
2025-09-14 20:46:12,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:46:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2915.20068 ± 160.506
2025-09-14 20:46:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2889.9783), np.float32(2879.8179), np.float32(2927.0986), np.float32(2969.9158), np.float32(2835.669), np.float32(2713.5698), np.float32(3184.2039), np.float32(2860.636), np.float32(2692.1484), np.float32(3198.9695)]
2025-09-14 20:46:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:46:22,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 14 seconds)
2025-09-14 20:49:20,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:49:29,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3009.23242 ± 137.035
2025-09-14 20:49:29,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2916.725), np.float32(2894.0918), np.float32(3105.0232), np.float32(2903.9758), np.float32(3217.4656), np.float32(3093.5115), np.float32(3066.03), np.float32(2726.4387), np.float32(3065.6428), np.float32(3103.419)]
2025-09-14 20:49:29,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:49:29,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
