2025-05-07 23:45:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4
2025-05-07 23:45:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4
2025-05-07 23:45:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x713e497cba90>}
2025-05-07 23:45:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1009 [DEBUG]: using device: cpu
2025-05-07 23:45:48,339 baseline-bpql-noisy-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-07 23:45:48,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1031 [INFO]: Creating new trainer
2025-05-07 23:45:48,349 baseline-bpql-noisy-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-07 23:45:48,350 baseline-bpql-noisy-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 23:45:48,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1092 [DEBUG]: Starting training session...
2025-05-07 23:45:48,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 1/100
2025-05-07 23:48:25,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:48:38,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -385.82089 ± 36.309
2025-05-07 23:48:38,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-391.1902, -404.29575, -377.5711, -291.29266, -376.71747, -409.56973, -385.7004, -383.17368, -440.64636, -398.05127]
2025-05-07 23:48:38,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:48:38,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (-385.82) for latency ExtremeSparseL4U32
2025-05-07 23:48:38,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 23:48:38,998 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 23:48:39,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 41 minutes, 8 seconds)
2025-05-07 23:51:26,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:51:40,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -261.57870 ± 46.418
2025-05-07 23:51:40,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-286.90366, -206.46089, -266.59866, -283.58047, -163.35146, -227.97978, -326.56818, -265.59534, -304.107, -284.6416]
2025-05-07 23:51:40,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:51:40,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (-261.58) for latency ExtremeSparseL4U32
2025-05-07 23:51:40,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 23:51:40,048 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 23:51:40,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 47 minutes)
2025-05-07 23:54:28,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:54:41,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -23.48270 ± 41.845
2025-05-07 23:54:41,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [63.298477, -49.267822, -64.1927, -21.87359, -22.029705, -13.1849375, -23.634518, -93.612114, -36.44645, 26.116327]
2025-05-07 23:54:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:54:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (-23.48) for latency ExtremeSparseL4U32
2025-05-07 23:54:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 23:54:41,563 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 23:54:41,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 47 minutes, 12 seconds)
2025-05-07 23:57:28,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:57:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 336.91260 ± 242.278
2025-05-07 23:57:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [799.37067, -25.597202, 407.77362, -82.21978, 247.7931, 545.6119, 355.34857, 431.71777, 350.41638, 338.9112]
2025-05-07 23:57:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:57:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (336.91) for latency ExtremeSparseL4U32
2025-05-07 23:57:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 23:57:41,605 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 23:57:41,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 45 minutes, 12 seconds)
2025-05-08 00:00:30,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:00:43,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 538.93439 ± 608.070
2025-05-08 00:00:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1151.2538, 1266.7598, 79.639305, 104.57178, 256.5461, 1239.3218, -53.57567, -72.67669, 1429.8995, -12.396237]
2025-05-08 00:00:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:00:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (538.93) for latency ExtremeSparseL4U32
2025-05-08 00:00:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:00:43,820 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:00:43,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 43 minutes, 29 seconds)
2025-05-08 00:03:35,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:03:48,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1099.01001 ± 505.358
2025-05-08 00:03:48,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1059.5837, 691.1973, 1902.1266, 1517.95, 1429.8083, 1675.153, 1093.2034, 722.6839, 711.8341, 186.55959]
2025-05-08 00:03:48,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:03:48,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1099.01) for latency ExtremeSparseL4U32
2025-05-08 00:03:48,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:03:48,880 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:03:48,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 45 minutes, 5 seconds)
2025-05-08 00:06:40,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:06:53,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1773.97852 ± 567.921
2025-05-08 00:06:53,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2285.5898, 2193.9727, 1100.436, 1113.8184, 736.508, 2173.0981, 2132.4653, 1837.1117, 2494.5574, 1672.2273]
2025-05-08 00:06:53,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:06:53,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1773.98) for latency ExtremeSparseL4U32
2025-05-08 00:06:53,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:06:53,751 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:06:53,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 43 minutes, 14 seconds)
2025-05-08 00:09:45,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:09:58,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2346.47900 ± 503.459
2025-05-08 00:09:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2358.469, 1512.9691, 1758.5231, 3172.5981, 1977.6715, 2693.7202, 2467.8154, 2838.9556, 1956.1772, 2727.892]
2025-05-08 00:09:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:09:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (2346.48) for latency ExtremeSparseL4U32
2025-05-08 00:09:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:09:58,756 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:09:58,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 41 minutes, 16 seconds)
2025-05-08 00:12:50,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:13:04,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2317.65088 ± 915.979
2025-05-08 00:13:04,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2283.303, 2585.6877, 2788.425, 116.11775, 1122.1726, 2886.258, 2699.4753, 2760.0872, 3361.3037, 2573.6765]
2025-05-08 00:13:04,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:13:04,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 39 minutes, 48 seconds)
2025-05-08 00:15:55,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:16:09,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2351.51489 ± 858.057
2025-05-08 00:16:09,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1660.3983, 1959.4647, 2428.8577, 3051.3555, 2530.877, 3089.1318, 195.7066, 2512.2153, 2925.1616, 3161.9783]
2025-05-08 00:16:09,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:16:09,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (2351.51) for latency ExtremeSparseL4U32
2025-05-08 00:16:09,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:16:09,169 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:16:09,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 37 minutes, 36 seconds)
2025-05-08 00:19:00,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:19:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2944.04053 ± 532.623
2025-05-08 00:19:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3472.6265, 3011.5393, 2979.8125, 2505.6533, 1619.5162, 2816.402, 3146.9185, 3152.8267, 3637.1055, 3098.005]
2025-05-08 00:19:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:19:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (2944.04) for latency ExtremeSparseL4U32
2025-05-08 00:19:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:19:14,122 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:19:14,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 34 minutes, 29 seconds)
2025-05-08 00:22:05,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:22:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3122.10596 ± 218.297
2025-05-08 00:22:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2823.148, 2967.0254, 2858.9348, 3057.636, 3339.1416, 3180.8887, 3210.275, 3544.2075, 2956.8345, 3282.967]
2025-05-08 00:22:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:22:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (3122.11) for latency ExtremeSparseL4U32
2025-05-08 00:22:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:22:19,033 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:22:19,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 31 minutes, 24 seconds)
2025-05-08 00:25:10,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:25:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3488.20972 ± 531.321
2025-05-08 00:25:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3462.3904, 2031.578, 4082.4827, 3812.8438, 3846.8345, 3714.6401, 3641.7458, 3567.3557, 3360.203, 3362.024]
2025-05-08 00:25:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:25:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (3488.21) for latency ExtremeSparseL4U32
2025-05-08 00:25:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:25:23,863 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:25:23,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 28 minutes, 16 seconds)
2025-05-08 00:28:15,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:28:29,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2961.02515 ± 566.687
2025-05-08 00:28:29,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1870.0507, 2942.5613, 2151.1235, 3097.1687, 2888.1372, 3964.7808, 2880.3005, 3416.3179, 3252.4888, 3147.3225]
2025-05-08 00:28:29,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:28:29,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 25 minutes, 13 seconds)
2025-05-08 00:31:20,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:31:34,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3157.68652 ± 1080.197
2025-05-08 00:31:34,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3943.297, 3373.643, 42.62163, 3722.5413, 3629.8875, 3345.98, 3108.9092, 2936.1616, 3716.2212, 3757.604]
2025-05-08 00:31:34,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:31:34,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 22 minutes, 7 seconds)
2025-05-08 00:34:18,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:34:31,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3198.30762 ± 1099.844
2025-05-08 00:34:31,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3802.4204, 2670.6785, 198.01602, 3542.9773, 3568.2544, 4105.638, 2850.7422, 3271.8428, 3938.063, 4034.443]
2025-05-08 00:34:31,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:34:31,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 16 minutes, 45 seconds)
2025-05-08 00:37:12,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:37:25,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3714.85620 ± 432.256
2025-05-08 00:37:25,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3830.232, 3724.5586, 4487.5205, 3788.6064, 3490.748, 3649.2727, 2650.8457, 3759.4805, 3942.7258, 3824.5693]
2025-05-08 00:37:25,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:37:25,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (3714.86) for latency ExtremeSparseL4U32
2025-05-08 00:37:25,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:37:25,280 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:37:25,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 10 minutes, 43 seconds)
2025-05-08 00:40:07,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:40:20,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3726.48047 ± 576.107
2025-05-08 00:40:20,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4516.6436, 4205.5015, 3769.533, 3824.6545, 2243.4775, 3892.6262, 3298.5088, 3827.1843, 3898.891, 3787.785]
2025-05-08 00:40:20,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:40:20,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (3726.48) for latency ExtremeSparseL4U32
2025-05-08 00:40:20,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:40:20,044 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:40:20,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 4 minutes, 57 seconds)
2025-05-08 00:43:03,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:43:15,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3915.94409 ± 550.335
2025-05-08 00:43:15,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3738.1235, 3953.4744, 3746.0947, 4435.885, 3809.6082, 3753.1562, 2585.1091, 4491.937, 4657.945, 3988.108]
2025-05-08 00:43:15,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:43:15,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (3915.94) for latency ExtremeSparseL4U32
2025-05-08 00:43:15,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:43:15,394 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:43:15,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 59 minutes, 15 seconds)
2025-05-08 00:45:57,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:46:10,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4048.38672 ± 476.675
2025-05-08 00:46:10,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4536.6475, 4481.2393, 3065.566, 4208.102, 3749.7854, 4569.2246, 3796.4424, 4034.3315, 4484.624, 3557.9065]
2025-05-08 00:46:10,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:46:10,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (4048.39) for latency ExtremeSparseL4U32
2025-05-08 00:46:10,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:46:10,158 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:46:10,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 53 minutes, 33 seconds)
2025-05-08 00:48:50,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:49:02,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3666.51123 ± 1242.852
2025-05-08 00:49:02,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2354.8088, 4772.571, 3941.7334, 3706.954, 4784.7646, 3986.076, 517.983, 4507.905, 4332.0405, 3760.2773]
2025-05-08 00:49:02,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:49:02,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 49 minutes, 26 seconds)
2025-05-08 00:51:44,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:51:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3434.48242 ± 1091.515
2025-05-08 00:51:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3031.0056, 3593.716, 1053.7283, 4189.654, 2092.8616, 4041.895, 5079.37, 4007.9402, 3951.2458, 3303.4075]
2025-05-08 00:51:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:51:57,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 46 minutes, 51 seconds)
2025-05-08 00:54:43,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:54:56,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4104.28955 ± 251.544
2025-05-08 00:54:56,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4287.5137, 4017.29, 4189.9834, 4329.561, 3623.7976, 4360.244, 4317.1265, 4209.9995, 3687.7803, 4019.599]
2025-05-08 00:54:56,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:54:56,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (4104.29) for latency ExtremeSparseL4U32
2025-05-08 00:54:56,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:54:56,242 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:54:56,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 44 minutes, 53 seconds)
2025-05-08 00:57:41,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:57:54,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3436.88330 ± 1207.372
2025-05-08 00:57:54,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [590.4276, 4222.3823, 4250.918, 3461.951, 4171.597, 4089.6545, 1685.4366, 3833.8787, 4416.7476, 3645.8362]
2025-05-08 00:57:54,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:57:54,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 42 minutes, 44 seconds)
2025-05-08 01:00:43,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:00:57,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4137.78467 ± 438.188
2025-05-08 01:00:57,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3895.8186, 3598.1196, 4530.2397, 4125.3623, 4946.965, 3468.4941, 4522.181, 4354.0146, 3793.0325, 4143.6216]
2025-05-08 01:00:57,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:00:57,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (4137.78) for latency ExtremeSparseL4U32
2025-05-08 01:00:57,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:00:57,026 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:00:57,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 41 minutes, 43 seconds)
2025-05-08 01:03:48,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:04:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4334.61230 ± 1068.717
2025-05-08 01:04:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1360.899, 4413.265, 4439.409, 4644.4937, 5035.3306, 5080.9736, 5250.859, 4961.9854, 3915.1248, 4243.7837]
2025-05-08 01:04:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:04:02,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (4334.61) for latency ExtremeSparseL4U32
2025-05-08 01:04:02,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:04:02,215 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:04:02,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 41 minutes, 57 seconds)
2025-05-08 01:06:54,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:07:07,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4277.19727 ± 434.255
2025-05-08 01:07:07,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4426.063, 4962.5576, 4115.3853, 4755.3716, 4482.107, 4760.9, 3839.459, 3790.0818, 3909.9978, 3730.049]
2025-05-08 01:07:07,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:07:07,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 41 minutes, 27 seconds)
2025-05-08 01:09:59,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:10:13,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3506.88281 ± 1602.728
2025-05-08 01:10:13,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4968.8965, 435.04092, 4932.995, 4189.7954, 3937.3567, 4310.951, 4015.7695, 4832.222, 627.22577, 2818.5742]
2025-05-08 01:10:13,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:10:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 40 minutes, 8 seconds)
2025-05-08 01:13:05,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:13:19,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4769.86230 ± 332.384
2025-05-08 01:13:19,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4883.072, 4312.0654, 5173.4126, 4891.9897, 4305.666, 4484.5713, 4962.262, 4927.825, 5280.1396, 4477.6216]
2025-05-08 01:13:19,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:13:19,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (4769.86) for latency ExtremeSparseL4U32
2025-05-08 01:13:19,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:13:19,346 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:13:19,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 38 minutes, 50 seconds)
2025-05-08 01:16:11,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:16:25,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4318.39258 ± 704.177
2025-05-08 01:16:25,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4421.2725, 3696.545, 4686.4404, 3040.6265, 4299.6606, 5050.571, 5096.659, 4271.8843, 5218.0986, 3402.1714]
2025-05-08 01:16:25,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:16:25,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 36 minutes, 32 seconds)
2025-05-08 01:19:17,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:19:30,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4112.35156 ± 416.445
2025-05-08 01:19:30,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4187.233, 4317.2754, 4497.449, 4636.3984, 3948.8176, 3666.3325, 3208.2764, 4242.4307, 4513.6685, 3905.632]
2025-05-08 01:19:30,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:19:30,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 33 minutes, 34 seconds)
2025-05-08 01:22:22,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:22:36,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4304.16504 ± 603.731
2025-05-08 01:22:36,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5073.983, 4416.541, 4586.496, 4656.095, 4012.8792, 3184.1409, 4501.3306, 4387.093, 4941.254, 3281.835]
2025-05-08 01:22:36,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:22:36,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 30 minutes, 26 seconds)
2025-05-08 01:25:28,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:25:41,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4494.04834 ± 841.268
2025-05-08 01:25:41,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5379.938, 4969.877, 4191.0967, 4871.462, 3866.065, 5263.7495, 5141.882, 4241.744, 2408.5793, 4606.0933]
2025-05-08 01:25:41,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:25:41,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 27 minutes, 19 seconds)
2025-05-08 01:28:33,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:28:47,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4266.16162 ± 456.670
2025-05-08 01:28:47,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4821.131, 5145.562, 3872.2224, 4396.7065, 4598.4824, 4197.1294, 3662.656, 3735.3845, 4229.684, 4002.6604]
2025-05-08 01:28:47,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:28:47,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 24 minutes, 9 seconds)
2025-05-08 01:31:39,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:31:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4482.10742 ± 1104.846
2025-05-08 01:31:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5113.475, 3879.603, 4169.916, 5164.19, 5173.047, 4436.07, 1497.9907, 4759.992, 5174.2275, 5452.566]
2025-05-08 01:31:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:31:53,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 21 minutes, 4 seconds)
2025-05-08 01:34:36,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:34:48,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4979.90430 ± 351.487
2025-05-08 01:34:48,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4130.3203, 4975.231, 5270.335, 4846.7744, 5082.6465, 5181.248, 4842.3677, 5503.2783, 4799.5957, 5167.2446]
2025-05-08 01:34:48,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:34:48,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (4979.90) for latency ExtremeSparseL4U32
2025-05-08 01:34:48,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:34:48,661 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:34:48,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 15 minutes, 48 seconds)
2025-05-08 01:37:29,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:37:41,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4546.22412 ± 465.356
2025-05-08 01:37:41,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4773.7554, 3820.4553, 4609.272, 5157.604, 3967.6619, 4872.0654, 4214.6265, 4716.8037, 4116.7686, 5213.232]
2025-05-08 01:37:41,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:37:41,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 10 minutes, 10 seconds)
2025-05-08 01:40:22,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:40:35,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3983.08057 ± 950.393
2025-05-08 01:40:35,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4161.055, 3752.0872, 3707.201, 2023.3688, 5286.337, 3631.472, 2962.1501, 4962.4976, 4313.77, 5030.863]
2025-05-08 01:40:35,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:40:35,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 4 minutes, 37 seconds)
2025-05-08 01:43:16,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:43:29,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3955.81055 ± 1127.929
2025-05-08 01:43:29,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4434.54, 4339.4736, 3377.7373, 5250.699, 5125.1045, 3192.2944, 4678.15, 2733.5742, 1590.5593, 4835.976]
2025-05-08 01:43:29,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:43:29,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 59 minutes, 21 seconds)
2025-05-08 01:46:15,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:46:28,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4843.82129 ± 531.158
2025-05-08 01:46:28,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5229.327, 4999.572, 4005.1836, 3947.511, 5491.181, 4885.819, 5410.872, 5326.9277, 4663.0933, 4478.7256]
2025-05-08 01:46:28,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:46:28,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 55 minutes, 6 seconds)
2025-05-08 01:49:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:49:28,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4571.34619 ± 510.809
2025-05-08 01:49:28,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4897.922, 3530.345, 5124.503, 4379.9746, 5306.6646, 4494.46, 4170.419, 4132.6924, 4968.791, 4707.687]
2025-05-08 01:49:28,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:49:28,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 52 minutes, 57 seconds)
2025-05-08 01:52:14,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:52:27,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3627.58594 ± 1332.512
2025-05-08 01:52:27,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4194.2554, 4770.2437, 5083.1733, 2253.4644, 3871.4417, 3910.3582, 3856.0522, 230.7064, 4174.835, 3931.3286]
2025-05-08 01:52:27,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:52:27,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 51 minutes, 15 seconds)
2025-05-08 01:55:14,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:55:26,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4810.77832 ± 517.990
2025-05-08 01:55:26,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5130.9634, 5307.213, 4599.616, 3686.3574, 4537.63, 5452.985, 4237.7407, 5174.797, 4909.8613, 5070.6216]
2025-05-08 01:55:26,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:55:26,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 49 minutes, 26 seconds)
2025-05-08 01:58:13,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:58:26,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4835.99512 ± 609.656
2025-05-08 01:58:26,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4257.308, 5534.236, 5234.166, 4723.692, 5075.97, 4050.328, 5011.0083, 4686.137, 3896.3936, 5890.7124]
2025-05-08 01:58:26,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:58:26,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 47 minutes, 21 seconds)
2025-05-08 02:01:11,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:01:24,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4052.55420 ± 1513.029
2025-05-08 02:01:24,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4941.166, 1548.7657, 5366.602, 5060.0947, 4712.3916, 4320.8306, 672.0073, 4514.1147, 4451.151, 4938.4175]
2025-05-08 02:01:24,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:01:24,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 44 minutes, 16 seconds)
2025-05-08 02:04:10,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:04:23,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4290.37207 ± 1231.808
2025-05-08 02:04:23,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4851.143, 891.5195, 5161.3496, 3525.837, 4867.456, 5395.9727, 4334.101, 4437.6396, 4627.9463, 4810.757]
2025-05-08 02:04:23,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:04:23,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 41 minutes, 12 seconds)
2025-05-08 02:07:10,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:07:23,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4010.33057 ± 1710.029
2025-05-08 02:07:23,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4830.251, 4268.705, 299.17725, 5043.4155, 3609.4143, 5109.27, 1451.2542, 5711.654, 4082.8206, 5697.3438]
2025-05-08 02:07:23,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:07:23,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 38 minutes, 17 seconds)
2025-05-08 02:10:09,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:10:22,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4480.05713 ± 798.347
2025-05-08 02:10:22,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5762.7085, 3872.1775, 3793.1555, 3265.6794, 4943.8496, 4912.016, 4640.51, 4879.5767, 5313.866, 3417.0308]
2025-05-08 02:10:22,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:10:22,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 35 minutes, 17 seconds)
2025-05-08 02:13:07,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:13:20,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4916.02197 ± 429.318
2025-05-08 02:13:20,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5220.6177, 5529.2563, 4586.3823, 4873.829, 4421.6455, 4083.2686, 5318.112, 4977.8296, 4839.9487, 5309.327]
2025-05-08 02:13:20,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:13:20,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 32 minutes, 2 seconds)
2025-05-08 02:16:05,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:16:18,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4716.70264 ± 590.383
2025-05-08 02:16:18,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4776.5684, 3750.1501, 3956.7107, 5551.277, 4467.2993, 5245.0703, 4570.66, 5655.8965, 4593.3457, 4600.0503]
2025-05-08 02:16:18,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:16:18,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 28 minutes, 55 seconds)
2025-05-08 02:19:03,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:19:15,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5053.64746 ± 448.600
2025-05-08 02:19:15,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5053.7227, 4778.647, 4649.0527, 4834.9023, 5103.525, 5720.207, 5623.69, 4159.9463, 5422.6914, 5190.085]
2025-05-08 02:19:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:19:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (5053.65) for latency ExtremeSparseL4U32
2025-05-08 02:19:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 02:19:15,890 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:19:15,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 25 minutes, 43 seconds)
2025-05-08 02:22:00,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:22:13,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4685.98340 ± 684.674
2025-05-08 02:22:13,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3298.0942, 4738.2173, 5471.98, 5193.0327, 4389.8403, 3573.822, 5133.063, 5145.8477, 4954.532, 4961.4067]
2025-05-08 02:22:13,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:22:13,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 22 minutes, 23 seconds)
2025-05-08 02:24:59,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:25:12,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4664.90332 ± 454.507
2025-05-08 02:25:12,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3660.9949, 4848.829, 4561.8286, 4489.835, 5352.716, 4663.037, 4678.349, 4532.6206, 5349.3804, 4511.4453]
2025-05-08 02:25:12,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:25:12,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 19 minutes, 22 seconds)
2025-05-08 02:27:58,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:28:11,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4875.99902 ± 432.802
2025-05-08 02:28:11,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5311.7627, 4515.0913, 3931.262, 5390.135, 5424.7344, 4978.5093, 4807.1284, 4936.096, 4600.182, 4865.0884]
2025-05-08 02:28:11,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:28:11,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 16 minutes, 36 seconds)
2025-05-08 02:30:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:31:09,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4512.54395 ± 1105.869
2025-05-08 02:31:09,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4892.7163, 4996.8774, 4406.004, 4819.356, 4906.185, 5415.717, 4997.505, 1356.4958, 5182.3057, 4152.2793]
2025-05-08 02:31:09,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:31:09,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 13 minutes, 45 seconds)
2025-05-08 02:33:55,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:34:08,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4902.83105 ± 472.271
2025-05-08 02:34:08,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5406.595, 4693.357, 4826.419, 5486.428, 3895.9763, 4660.9326, 5365.7124, 4777.556, 5330.548, 4584.786]
2025-05-08 02:34:08,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:34:08,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 10 minutes, 56 seconds)
2025-05-08 02:36:54,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:37:06,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4458.44385 ± 985.845
2025-05-08 02:37:06,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1875.3607, 5734.588, 4353.1733, 4218.444, 4549.144, 4457.025, 5400.2656, 4483.3726, 5127.8623, 4385.201]
2025-05-08 02:37:06,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:37:06,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 8 minutes, 2 seconds)
2025-05-08 02:39:52,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:40:05,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4918.72461 ± 478.697
2025-05-08 02:40:05,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4963.6987, 4210.4575, 4853.553, 4344.4604, 5337.7812, 4409.9023, 5138.2764, 5807.7397, 4795.688, 5325.693]
2025-05-08 02:40:05,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:40:05,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 4 minutes, 57 seconds)
2025-05-08 02:42:50,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:43:03,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4558.24316 ± 711.054
2025-05-08 02:43:03,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4809.4785, 5429.263, 4806.568, 5347.91, 4691.0835, 3831.2964, 3000.5142, 4383.505, 5157.139, 4125.67]
2025-05-08 02:43:03,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:43:03,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 1 minute, 52 seconds)
2025-05-08 02:45:49,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:46:02,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4980.59326 ± 325.097
2025-05-08 02:46:02,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5086.4307, 4595.2246, 4986.4014, 4709.4233, 4707.8003, 5270.313, 4799.765, 5527.308, 4661.3105, 5461.9565]
2025-05-08 02:46:02,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:46:02,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 58 minutes, 56 seconds)
2025-05-08 02:48:48,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:49:00,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4702.67480 ± 661.632
2025-05-08 02:49:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4314.4546, 3630.2473, 5331.9165, 4887.8667, 5316.281, 5643.167, 5155.0137, 3804.6335, 4837.35, 4105.815]
2025-05-08 02:49:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:49:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 55 minutes, 57 seconds)
2025-05-08 02:51:44,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:51:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5144.42529 ± 541.523
2025-05-08 02:51:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5879.1533, 5660.787, 5203.656, 3808.9146, 5405.452, 4832.7437, 5076.0493, 4953.7856, 5507.5063, 5116.204]
2025-05-08 02:51:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:51:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (5144.43) for latency ExtremeSparseL4U32
2025-05-08 02:51:56,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 02:51:56,546 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:51:56,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 52 minutes, 40 seconds)
2025-05-08 02:54:40,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:54:53,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4638.51904 ± 464.981
2025-05-08 02:54:53,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4784.896, 4270.445, 4742.598, 4483.8335, 4959.772, 4880.0503, 4274.5264, 5409.7397, 4937.043, 3642.2876]
2025-05-08 02:54:53,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:54:53,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 49 minutes, 31 seconds)
2025-05-08 02:57:36,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:57:48,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5106.18359 ± 454.445
2025-05-08 02:57:48,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4595.0264, 4900.9478, 5023.5244, 5946.692, 5228.3975, 5351.83, 5700.4077, 4336.337, 4980.37, 4998.299]
2025-05-08 02:57:48,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:57:48,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 46 minutes, 15 seconds)
2025-05-08 03:00:32,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:00:44,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4874.50488 ± 342.353
2025-05-08 03:00:44,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4856.9917, 4623.87, 4486.0356, 4323.2446, 5370.0273, 5219.076, 5277.691, 4818.8135, 5133.6484, 4635.653]
2025-05-08 03:00:44,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:00:44,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 42 minutes, 59 seconds)
2025-05-08 03:03:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:03:40,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4890.94629 ± 504.384
2025-05-08 03:03:40,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4400.7397, 4865.272, 4118.6714, 5692.2446, 4336.581, 4795.584, 4930.812, 4780.2, 5443.1216, 5546.243]
2025-05-08 03:03:40,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:03:40,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 39 minutes, 41 seconds)
2025-05-08 03:06:22,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:06:34,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4763.83789 ± 626.812
2025-05-08 03:06:34,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4675.1074, 4811.9336, 4296.519, 5450.605, 5450.1553, 3451.3633, 4561.151, 4445.2734, 4788.01, 5708.2627]
2025-05-08 03:06:34,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:06:34,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 36 minutes, 35 seconds)
2025-05-08 03:09:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:09:28,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5143.32129 ± 382.737
2025-05-08 03:09:28,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5204.5737, 5719.245, 4550.6167, 4741.7563, 4689.75, 5262.7446, 5692.7983, 5348.9717, 5270.492, 4952.266]
2025-05-08 03:09:28,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:09:29,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 33 minutes, 25 seconds)
2025-05-08 03:12:11,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:12:23,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4368.79297 ± 1272.863
2025-05-08 03:12:23,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5205.854, 923.34393, 4366.3823, 5011.941, 5033.127, 5498.557, 3368.956, 4798.9287, 4603.021, 4877.822]
2025-05-08 03:12:23,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:12:23,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 30 minutes, 25 seconds)
2025-05-08 03:15:05,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:15:18,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4800.51270 ± 741.811
2025-05-08 03:15:18,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5176.654, 5101.246, 3448.5671, 4740.1577, 5617.7397, 5915.77, 4230.342, 3831.1145, 4652.1084, 5291.43]
2025-05-08 03:15:18,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:15:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 27 minutes, 19 seconds)
2025-05-08 03:18:00,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:18:12,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4180.98926 ± 1418.423
2025-05-08 03:18:12,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3719.4631, 4511.782, 4881.948, 4914.17, 3030.146, 5172.224, 5597.602, 5730.9795, 3468.1675, 783.409]
2025-05-08 03:18:12,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:18:12,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 24 minutes, 19 seconds)
2025-05-08 03:20:58,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:21:10,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5120.31445 ± 369.492
2025-05-08 03:21:10,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5318.761, 4876.3384, 4869.651, 5265.725, 5628.669, 5748.617, 5336.902, 4733.1772, 4714.909, 4710.392]
2025-05-08 03:21:10,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:21:10,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 21 minutes, 47 seconds)
2025-05-08 03:23:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:24:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4793.54980 ± 927.110
2025-05-08 03:24:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4564.671, 5812.374, 4409.41, 5710.9536, 3996.357, 5570.0654, 4754.598, 5422.7314, 2604.4727, 5089.869]
2025-05-08 03:24:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:24:09,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 19 minutes, 13 seconds)
2025-05-08 03:26:52,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:27:05,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3821.50635 ± 1341.640
2025-05-08 03:27:05,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5028.4683, 3772.4607, 1702.9456, 3450.7979, 3593.7058, 5410.4375, 4856.1553, 4881.868, 1196.057, 4322.1626]
2025-05-08 03:27:05,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:27:05,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 16 minutes, 24 seconds)
2025-05-08 03:29:50,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:30:03,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4889.65869 ± 510.450
2025-05-08 03:30:03,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5305.338, 4790.2417, 4528.0654, 4499.1943, 4854.3706, 4986.8887, 5227.89, 4394.066, 6057.456, 4253.0737]
2025-05-08 03:30:03,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:30:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 13 minutes, 46 seconds)
2025-05-08 03:32:48,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:33:01,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4104.24902 ± 1367.308
2025-05-08 03:33:01,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [364.76364, 4914.139, 5149.1763, 5128.6426, 4916.716, 3662.0598, 4045.523, 3510.3904, 4833.4863, 4517.595]
2025-05-08 03:33:01,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:33:01,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 11 minutes, 8 seconds)
2025-05-08 03:35:46,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:35:58,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5249.41455 ± 380.264
2025-05-08 03:35:58,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5821.3125, 5737.1562, 4486.5864, 5215.763, 5430.663, 5245.065, 5396.4824, 4875.3135, 5343.9805, 4941.8193]
2025-05-08 03:35:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:35:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (5249.41) for latency ExtremeSparseL4U32
2025-05-08 03:35:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 03:35:58,658 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:35:58,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 8 minutes, 3 seconds)
2025-05-08 03:38:40,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:38:53,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4904.58398 ± 697.393
2025-05-08 03:38:53,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3814.4167, 3930.7114, 4931.5986, 3966.5994, 5743.9478, 5393.094, 5628.4927, 5135.888, 5043.137, 5457.9526]
2025-05-08 03:38:53,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:38:53,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 4 minutes, 49 seconds)
2025-05-08 03:41:34,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:41:47,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4806.03271 ± 1268.207
2025-05-08 03:41:47,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5715.9497, 6161.853, 5058.2266, 5299.943, 4905.2676, 5548.11, 4269.438, 1354.7065, 4460.7715, 5286.061]
2025-05-08 03:41:47,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:41:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 1 minute, 44 seconds)
2025-05-08 03:44:28,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:44:41,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4958.98145 ± 527.603
2025-05-08 03:44:41,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5482.2583, 5072.722, 3816.9094, 4887.836, 4338.0674, 5463.923, 5394.5137, 4759.8574, 5513.8247, 4859.904]
2025-05-08 03:44:41,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:44:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 81/100 (estimated time remaining: 58 minutes, 30 seconds)
2025-05-08 03:47:23,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:47:35,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4798.71973 ± 862.741
2025-05-08 03:47:35,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3803.2312, 2779.3333, 5442.31, 4957.854, 5122.9404, 4694.9897, 4544.106, 5742.861, 5536.065, 5363.505]
2025-05-08 03:47:35,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:47:35,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 82/100 (estimated time remaining: 55 minutes, 19 seconds)
2025-05-08 03:50:17,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:50:29,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4747.29248 ± 920.206
2025-05-08 03:50:29,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4776.056, 5549.9883, 5519.06, 4181.7217, 5379.156, 4840.376, 4781.7373, 4556.9106, 5551.927, 2335.9934]
2025-05-08 03:50:29,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:50:30,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 83/100 (estimated time remaining: 52 minutes, 16 seconds)
2025-05-08 03:53:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:53:24,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4597.65137 ± 810.128
2025-05-08 03:53:24,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2654.028, 4429.4136, 5257.362, 5366.2246, 5292.2495, 4806.95, 4919.5625, 4864.5586, 4785.825, 3600.3442]
2025-05-08 03:53:24,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:53:24,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 84/100 (estimated time remaining: 49 minutes, 23 seconds)
2025-05-08 03:56:07,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:56:19,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4591.76367 ± 1111.515
2025-05-08 03:56:19,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4645.666, 4893.2173, 5203.3135, 4320.312, 5425.7256, 5216.9834, 4748.2754, 5517.4453, 1455.639, 4491.059]
2025-05-08 03:56:19,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:56:19,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 85/100 (estimated time remaining: 46 minutes, 31 seconds)
2025-05-08 03:59:01,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:59:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5002.46143 ± 439.646
2025-05-08 03:59:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5279.244, 4985.201, 4156.8535, 5339.5576, 5471.762, 5009.528, 4419.6816, 5418.501, 4589.755, 5354.532]
2025-05-08 03:59:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:59:14,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 86/100 (estimated time remaining: 43 minutes, 39 seconds)
2025-05-08 04:01:56,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:02:09,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4829.54004 ± 880.759
2025-05-08 04:02:09,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5278.4478, 5533.063, 4203.2324, 3586.0413, 3260.7036, 4415.296, 5951.9575, 5869.617, 5058.3765, 5138.669]
2025-05-08 04:02:09,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:02:09,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 87/100 (estimated time remaining: 40 minutes, 46 seconds)
2025-05-08 04:04:51,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:05:04,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4950.36914 ± 444.697
2025-05-08 04:05:04,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4319.062, 4240.6357, 5054.868, 5022.0073, 5422.1265, 5643.6284, 5448.675, 4907.8774, 4690.627, 4754.185]
2025-05-08 04:05:04,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:05:04,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 53 seconds)
2025-05-08 04:07:46,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:07:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4496.22656 ± 569.108
2025-05-08 04:07:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3333.7842, 4174.7837, 4626.461, 4705.1997, 4693.8193, 4913.2725, 4275.6504, 3832.6846, 5098.944, 5307.663]
2025-05-08 04:07:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:07:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 59 seconds)
2025-05-08 04:10:41,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:10:54,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5217.68799 ± 550.760
2025-05-08 04:10:54,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5357.3467, 5564.528, 5446.9814, 4904.0723, 4389.235, 6128.2065, 4250.3677, 5313.375, 5733.372, 5089.396]
2025-05-08 04:10:54,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:10:54,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 4 seconds)
2025-05-08 04:13:36,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:13:48,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5090.49805 ± 376.925
2025-05-08 04:13:48,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4335.942, 5036.2427, 5699.1763, 4982.7505, 5221.3853, 5640.399, 4860.875, 4920.663, 5283.8843, 4923.664]
2025-05-08 04:13:48,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:13:49,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 91/100 (estimated time remaining: 29 minutes, 9 seconds)
2025-05-08 04:16:31,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:16:43,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4725.17920 ± 676.132
2025-05-08 04:16:43,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5439.999, 4690.8794, 4715.006, 5369.3076, 2841.9624, 4872.2725, 4690.1714, 4828.8354, 4933.0874, 4870.2744]
2025-05-08 04:16:43,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:16:43,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 13 seconds)
2025-05-08 04:19:25,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:19:38,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5298.47119 ± 494.524
2025-05-08 04:19:38,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4406.8784, 5814.807, 6302.827, 4967.741, 5611.138, 4932.4946, 5337.275, 5213.5317, 5252.296, 5145.717]
2025-05-08 04:19:38,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:19:38,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (5298.47) for latency ExtremeSparseL4U32
2025-05-08 04:19:38,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 04:19:38,078 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:19:38,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 18 seconds)
2025-05-08 04:22:20,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:22:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5040.58350 ± 441.478
2025-05-08 04:22:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5069.5854, 5627.8335, 4357.41, 4934.1333, 4312.572, 5525.7144, 4903.7686, 5541.5, 5277.5117, 4855.809]
2025-05-08 04:22:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:22:32,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 22 seconds)
2025-05-08 04:25:15,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:25:27,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4849.85840 ± 1433.194
2025-05-08 04:25:27,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [785.325, 5439.3413, 5635.367, 5464.2217, 4440.8887, 4796.706, 4958.617, 5934.434, 5065.3735, 5978.3135]
2025-05-08 04:25:27,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:25:27,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 28 seconds)
2025-05-08 04:28:10,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:28:22,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4351.27930 ± 1529.821
2025-05-08 04:28:22,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4549.5967, 5243.2397, 4454.418, 504.7393, 4794.6025, 5498.2236, 5624.2036, 4729.9014, 2588.4153, 5525.454]
2025-05-08 04:28:22,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:28:22,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 33 seconds)
2025-05-08 04:31:04,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:31:17,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4830.28125 ± 874.583
2025-05-08 04:31:17,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3766.1997, 5224.6753, 5053.577, 5480.287, 5376.733, 5310.226, 5478.6934, 2716.3801, 5418.6943, 4477.3467]
2025-05-08 04:31:17,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:31:17,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 39 seconds)
2025-05-08 04:33:59,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:34:12,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5315.27393 ± 559.252
2025-05-08 04:34:12,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5152.7183, 6033.0117, 5065.7446, 5632.251, 4844.391, 5939.8306, 4017.9966, 5605.5186, 5524.776, 5336.4946]
2025-05-08 04:34:12,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:34:12,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (5315.27) for latency ExtremeSparseL4U32
2025-05-08 04:34:12,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 04:34:12,130 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:34:12,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 44 seconds)
2025-05-08 04:36:55,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:37:07,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5196.07715 ± 492.977
2025-05-08 04:37:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5716.7114, 4671.12, 4759.537, 5368.71, 5234.118, 5441.733, 5276.8643, 5617.829, 4136.1724, 5737.9785]
2025-05-08 04:37:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:37:07,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 49 seconds)
2025-05-08 04:39:50,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:40:02,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4937.35254 ± 741.261
2025-05-08 04:40:02,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4430.622, 5760.9814, 4521.7124, 5204.754, 5183.589, 5338.9565, 3201.0085, 4924.301, 4836.719, 5970.8774]
2025-05-08 04:40:02,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:40:02,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 55 seconds)
2025-05-08 04:42:45,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:42:57,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5200.22754 ± 422.943
2025-05-08 04:42:57,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5706.374, 5294.6465, 5011.4233, 5613.591, 4966.957, 5583.725, 4964.4263, 5642.926, 4896.561, 4321.643]
2025-05-08 04:42:57,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:42:57,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1149 [DEBUG]: Training session finished
