2025-05-06 07:26:31,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32
2025-05-06 07:26:31,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32
2025-05-06 07:26:31,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1008 [DEBUG]: args.trainer_eval_latencies: {'SparseU15': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x73222adcfa00>}
2025-05-06 07:26:31,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1009 [DEBUG]: using device: cpu
2025-05-06 07:26:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1031 [INFO]: Creating new trainer
2025-05-06 07:26:31,341 baseline-bpql-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-06 07:26:31,341 baseline-bpql-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 07:26:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1092 [DEBUG]: Starting training session...
2025-05-06 07:26:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 1/100
2025-05-06 07:29:19,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:29:45,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -440.57770 ± 3.115
2025-05-06 07:29:45,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-438.20016, -445.05283, -439.15543, -439.46356, -443.79947, -437.44717, -441.38098, -440.68555, -445.14807, -435.4433]
2025-05-06 07:29:45,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:29:45,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (-440.58) for latency SparseU15
2025-05-06 07:29:45,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:29:45,372 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:29:45,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 19 minutes, 15 seconds)
2025-05-06 07:32:43,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:33:08,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -183.34088 ± 75.151
2025-05-06 07:33:08,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-177.75755, -195.44649, -114.38228, -109.38801, -158.86955, -152.89427, -285.01367, -103.21995, -350.02026, -186.4168]
2025-05-06 07:33:08,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:33:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (-183.34) for latency SparseU15
2025-05-06 07:33:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:33:08,837 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:33:08,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 24 minutes, 11 seconds)
2025-05-06 07:36:07,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:36:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -116.96046 ± 70.050
2025-05-06 07:36:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-102.197426, -59.39396, -51.83958, -116.39925, -174.93227, -12.624072, -158.40863, -65.97239, -169.58835, -258.24866]
2025-05-06 07:36:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:36:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (-116.96) for latency SparseU15
2025-05-06 07:36:32,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:36:32,569 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:36:32,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 23 minutes, 42 seconds)
2025-05-06 07:39:31,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:39:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4.79826 ± 114.375
2025-05-06 07:39:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-133.55086, 26.767313, -46.438034, 71.4124, -89.17383, 184.18039, -126.035286, 17.836052, -64.4109, 207.39536]
2025-05-06 07:39:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:39:56,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (4.80) for latency SparseU15
2025-05-06 07:39:56,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:39:56,277 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:39:56,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 21 minutes, 45 seconds)
2025-05-06 07:42:49,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:43:14,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 135.75674 ± 90.815
2025-05-06 07:43:14,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [64.46729, 6.8150363, 61.15581, 210.30518, 89.61589, 191.10028, 304.34644, 43.216553, 172.66681, 213.87816]
2025-05-06 07:43:14,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:43:14,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (135.76) for latency SparseU15
2025-05-06 07:43:14,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:43:14,552 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:43:14,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 17 minutes, 30 seconds)
2025-05-06 07:46:07,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:46:32,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 369.97089 ± 182.162
2025-05-06 07:46:32,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [507.16257, 373.77658, 406.25668, 328.93417, 413.52347, 614.61346, -115.50888, 415.8252, 305.66397, 449.46133]
2025-05-06 07:46:32,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:46:32,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (369.97) for latency SparseU15
2025-05-06 07:46:32,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:46:32,582 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:46:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 15 minutes, 35 seconds)
2025-05-06 07:49:25,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:49:50,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 705.60925 ± 78.144
2025-05-06 07:49:50,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [823.80975, 603.21735, 744.8313, 688.6233, 610.6399, 749.6529, 790.9631, 694.02356, 760.4596, 589.8719]
2025-05-06 07:49:50,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:49:50,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (705.61) for latency SparseU15
2025-05-06 07:49:50,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:49:50,570 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:49:50,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 10 minutes, 31 seconds)
2025-05-06 07:52:43,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:53:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 815.87024 ± 136.717
2025-05-06 07:53:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1046.54, 896.8145, 836.3219, 768.16846, 961.4518, 625.747, 847.3498, 858.28876, 572.0605, 745.9596]
2025-05-06 07:53:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:53:08,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (815.87) for latency SparseU15
2025-05-06 07:53:08,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:53:08,471 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:53:08,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 5 minutes, 24 seconds)
2025-05-06 07:56:01,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:56:26,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 894.14050 ± 74.233
2025-05-06 07:56:26,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [830.416, 902.90405, 845.347, 825.5292, 792.8601, 856.8646, 975.4403, 905.8145, 1032.5752, 973.6546]
2025-05-06 07:56:26,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:56:26,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (894.14) for latency SparseU15
2025-05-06 07:56:26,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:56:26,282 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:56:26,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 18 seconds)
2025-05-06 07:59:19,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:59:44,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 979.19958 ± 80.157
2025-05-06 07:59:44,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [957.398, 958.1476, 777.3754, 1026.0936, 1035.2338, 1048.6946, 948.5722, 999.4814, 958.56714, 1082.4326]
2025-05-06 07:59:44,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:59:44,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (979.20) for latency SparseU15
2025-05-06 07:59:44,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:59:44,172 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:59:44,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 56 minutes, 53 seconds)
2025-05-06 08:02:36,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:03:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1063.98267 ± 140.241
2025-05-06 08:03:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1119.4805, 829.75073, 924.1103, 1086.9541, 1101.0701, 1164.1859, 1125.9019, 855.6005, 1134.189, 1298.5841]
2025-05-06 08:03:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:03:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1063.98) for latency SparseU15
2025-05-06 08:03:02,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:03:02,078 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 08:03:02,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 53 minutes, 33 seconds)
2025-05-06 08:05:55,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:06:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1090.70105 ± 212.596
2025-05-06 08:06:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [986.78864, 789.33594, 1075.7156, 1038.8011, 1467.1592, 1506.5303, 946.9451, 997.6892, 1051.1874, 1046.8588]
2025-05-06 08:06:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:06:20,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1090.70) for latency SparseU15
2025-05-06 08:06:20,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:06:20,114 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 08:06:20,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 50 minutes, 15 seconds)
2025-05-06 08:09:13,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:09:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1188.66284 ± 160.357
2025-05-06 08:09:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1423.1747, 1320.695, 948.6268, 1411.7377, 1219.609, 1243.3839, 1056.7393, 1043.876, 1003.4498, 1215.3356]
2025-05-06 08:09:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:09:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1188.66) for latency SparseU15
2025-05-06 08:09:38,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:09:38,213 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 08:09:38,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 47 minutes, 1 second)
2025-05-06 08:12:31,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:12:56,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1227.22729 ± 157.031
2025-05-06 08:12:56,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1028.66, 1278.079, 1169.664, 1074.5782, 1461.2439, 1202.3624, 1132.4104, 1329.0765, 1082.2692, 1513.9285]
2025-05-06 08:12:56,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:12:56,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1227.23) for latency SparseU15
2025-05-06 08:12:56,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:12:56,648 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 08:12:56,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 43 minutes, 54 seconds)
2025-05-06 08:15:49,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:16:14,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1186.54114 ± 190.097
2025-05-06 08:16:14,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1058.6576, 1208.701, 1178.5984, 1472.9965, 1103.5121, 782.22687, 1149.5754, 1224.8054, 1484.8438, 1201.4944]
2025-05-06 08:16:14,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:16:14,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 40 minutes, 39 seconds)
2025-05-06 08:19:07,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:19:32,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1329.91406 ± 173.521
2025-05-06 08:19:32,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1357.5637, 1356.6102, 1134.2921, 1131.0275, 1269.443, 1538.3981, 1211.541, 1146.772, 1525.4397, 1628.0533]
2025-05-06 08:19:32,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:19:32,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1329.91) for latency SparseU15
2025-05-06 08:19:32,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:19:32,669 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 08:19:32,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 37 minutes, 21 seconds)
2025-05-06 08:22:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:22:50,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1303.52173 ± 219.056
2025-05-06 08:22:50,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1539.8091, 1309.6837, 1605.4591, 1399.6731, 1155.939, 1080.9663, 1418.1786, 1418.4504, 828.90015, 1278.157]
2025-05-06 08:22:50,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:22:50,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 34 minutes, 3 seconds)
2025-05-06 08:25:43,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:26:08,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1316.18848 ± 248.697
2025-05-06 08:26:08,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1234.9789, 1867.8552, 1187.1432, 1679.2651, 1156.676, 1166.5409, 1261.8264, 1037.4131, 1404.7828, 1165.403]
2025-05-06 08:26:08,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:26:08,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 30 minutes, 41 seconds)
2025-05-06 08:29:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:29:26,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1492.56018 ± 337.344
2025-05-06 08:29:26,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1180.4727, 1171.9622, 1216.0934, 1348.5033, 1900.0372, 2058.0674, 1929.3486, 1602.9412, 1112.7662, 1405.409]
2025-05-06 08:29:26,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:29:26,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1492.56) for latency SparseU15
2025-05-06 08:29:26,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:29:26,693 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 08:29:26,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 27 minutes, 18 seconds)
2025-05-06 08:32:19,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:32:44,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1416.24890 ± 292.389
2025-05-06 08:32:44,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1502.7435, 1220.779, 1301.4653, 1238.7803, 1347.7949, 1783.5549, 2077.112, 1465.7596, 1090.6699, 1133.8293]
2025-05-06 08:32:44,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:32:44,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 23 minutes, 57 seconds)
2025-05-06 08:35:37,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:36:02,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1359.17676 ± 195.215
2025-05-06 08:36:02,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1843.7206, 1324.6589, 1309.0168, 1150.8029, 1402.3862, 1101.8419, 1382.8474, 1315.6532, 1262.6595, 1498.1808]
2025-05-06 08:36:02,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:36:02,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 20 minutes, 41 seconds)
2025-05-06 08:38:55,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:39:20,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1382.31519 ± 158.956
2025-05-06 08:39:20,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1213.8009, 1537.9465, 1688.1534, 1335.0148, 1126.8618, 1465.3063, 1249.2852, 1493.0492, 1377.8434, 1335.8893]
2025-05-06 08:39:20,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:39:20,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 17 minutes, 26 seconds)
2025-05-06 08:42:13,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:42:38,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1407.23926 ± 234.216
2025-05-06 08:42:38,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1233.8044, 1128.7979, 1047.3484, 1161.6129, 1618.0005, 1485.752, 1720.9781, 1667.8527, 1416.3702, 1591.876]
2025-05-06 08:42:38,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:42:38,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 14 minutes, 11 seconds)
2025-05-06 08:45:32,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:45:57,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1683.52209 ± 365.289
2025-05-06 08:45:57,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1215.258, 1876.0249, 1621.153, 2323.7253, 1753.0947, 1999.3948, 1373.6721, 2071.1248, 1199.3839, 1402.3889]
2025-05-06 08:45:57,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:45:57,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1683.52) for latency SparseU15
2025-05-06 08:45:57,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:45:57,169 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 08:45:57,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 10 minutes, 55 seconds)
2025-05-06 08:48:50,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:49:15,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1475.30383 ± 384.569
2025-05-06 08:49:15,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1181.552, 1381.6553, 1093.8546, 1207.1177, 1763.7324, 1110.9235, 2179.267, 2077.9016, 1569.638, 1187.3961]
2025-05-06 08:49:15,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:49:15,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 7 minutes, 46 seconds)
2025-05-06 08:52:08,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:52:33,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1410.64453 ± 401.980
2025-05-06 08:52:33,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [973.7751, 1122.2585, 2111.1768, 1535.0372, 1134.7427, 1686.7733, 1053.908, 1153.3262, 1244.8677, 2090.5806]
2025-05-06 08:52:33,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:52:33,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 4 minutes, 31 seconds)
2025-05-06 08:55:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:55:52,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1590.52917 ± 599.223
2025-05-06 08:55:52,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1138.012, 1283.5739, 2858.3982, 1517.2896, 1217.7609, 1096.7328, 1084.0541, 1215.9674, 2384.5505, 2108.9521]
2025-05-06 08:55:52,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:55:52,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 1 minute, 13 seconds)
2025-05-06 08:58:45,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:59:11,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1327.82263 ± 210.486
2025-05-06 08:59:11,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1436.5723, 1090.9205, 1097.1069, 1183.012, 1166.0464, 1519.6877, 1141.011, 1671.3888, 1618.1185, 1354.3634]
2025-05-06 08:59:11,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:59:11,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 58 minutes, 6 seconds)
2025-05-06 09:02:05,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:02:30,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1561.92346 ± 403.436
2025-05-06 09:02:30,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2099.089, 1935.2761, 1399.284, 1486.5696, 1205.522, 1513.0248, 2345.1206, 1379.3292, 1041.9214, 1214.0972]
2025-05-06 09:02:30,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:02:30,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 55 minutes)
2025-05-06 09:05:24,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:05:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1545.37146 ± 301.125
2025-05-06 09:05:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2122.2988, 1867.9924, 1237.1593, 1185.2676, 1523.0385, 1686.3229, 1488.5688, 1797.3544, 1203.7699, 1341.9434]
2025-05-06 09:05:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:05:49,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 51 minutes, 53 seconds)
2025-05-06 09:08:43,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:09:08,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1482.43359 ± 400.886
2025-05-06 09:09:08,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1168.1393, 1659.731, 1317.2903, 1260.9404, 2100.535, 2110.4333, 728.93024, 1358.5442, 1439.4823, 1680.3109]
2025-05-06 09:09:08,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:09:08,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 48 minutes, 49 seconds)
2025-05-06 09:12:02,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:12:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1681.86548 ± 484.120
2025-05-06 09:12:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1714.0026, 2610.6997, 1717.5812, 1482.9076, 2533.3738, 1578.0454, 1484.629, 1166.5942, 1104.2007, 1426.6207]
2025-05-06 09:12:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:12:27,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 45 minutes, 39 seconds)
2025-05-06 09:15:21,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:15:46,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1615.21265 ± 426.835
2025-05-06 09:15:46,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1574.6604, 1433.8895, 1405.4363, 2534.009, 1168.067, 1688.8538, 2212.9983, 1578.9908, 1046.0968, 1509.1238]
2025-05-06 09:15:46,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:15:46,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 42 minutes, 17 seconds)
2025-05-06 09:18:39,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:19:04,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1482.39331 ± 166.822
2025-05-06 09:19:04,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1754.1174, 1612.8882, 1581.1013, 1445.0887, 1114.5046, 1456.207, 1599.2443, 1427.4435, 1503.1493, 1330.1898]
2025-05-06 09:19:04,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:19:04,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 38 minutes, 47 seconds)
2025-05-06 09:21:57,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:22:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1793.75293 ± 537.390
2025-05-06 09:22:22,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1851.2523, 2062.7617, 2790.4094, 1571.6937, 2280.961, 1317.9199, 1613.0333, 766.24347, 1515.4657, 2167.7883]
2025-05-06 09:22:22,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:22:22,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1793.75) for latency SparseU15
2025-05-06 09:22:22,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 09:22:22,945 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 09:22:22,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 35 minutes, 14 seconds)
2025-05-06 09:25:16,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:25:41,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1718.35876 ± 450.204
2025-05-06 09:25:41,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1463.6196, 2067.9087, 1271.966, 2755.002, 2055.638, 1583.7822, 1278.5994, 1716.0011, 1761.3328, 1229.7374]
2025-05-06 09:25:41,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:25:41,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 31 minutes, 44 seconds)
2025-05-06 09:28:34,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:28:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1687.27637 ± 492.278
2025-05-06 09:28:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1782.5912, 2456.945, 1556.1731, 2246.8853, 1062.7095, 2213.3125, 1930.477, 1005.1295, 1394.4321, 1224.109]
2025-05-06 09:28:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:28:59,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 28 minutes, 15 seconds)
2025-05-06 09:31:53,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:32:18,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1628.19507 ± 641.404
2025-05-06 09:32:18,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1251.4275, 1907.5725, 1505.3813, 1832.1362, 344.19153, 1172.8121, 1202.2479, 2054.512, 2702.8003, 2308.869]
2025-05-06 09:32:18,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:32:18,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 25 minutes, 1 second)
2025-05-06 09:35:10,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:35:36,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1525.48242 ± 514.497
2025-05-06 09:35:36,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1557.8649, 1443.1644, 330.39383, 1650.3906, 1874.4247, 1358.188, 1461.3236, 1250.1155, 1891.4545, 2437.505]
2025-05-06 09:35:36,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:35:36,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 21 minutes, 34 seconds)
2025-05-06 09:38:28,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:38:53,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1522.14392 ± 476.795
2025-05-06 09:38:53,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1289.1466, 1649.7946, 2631.3118, 1259.1859, 2126.5222, 1195.3539, 1281.5714, 1570.7726, 987.46606, 1230.3158]
2025-05-06 09:38:53,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:38:53,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 18 minutes, 10 seconds)
2025-05-06 09:41:46,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:42:11,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1491.46643 ± 215.142
2025-05-06 09:42:11,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1853.7095, 1364.6332, 1688.4027, 1372.734, 1425.1838, 1863.0084, 1269.909, 1262.7996, 1455.3025, 1358.9808]
2025-05-06 09:42:11,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:42:11,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 14 minutes, 47 seconds)
2025-05-06 09:45:04,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:45:29,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1504.98389 ± 294.503
2025-05-06 09:45:29,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1517.479, 1188.9022, 1591.4043, 1379.8799, 1470.2765, 2041.7563, 1301.7197, 2040.4601, 1205.4861, 1312.4744]
2025-05-06 09:45:29,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:45:29,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 11 minutes, 28 seconds)
2025-05-06 09:48:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:48:47,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1517.31885 ± 490.811
2025-05-06 09:48:47,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1223.3942, 2874.5574, 1316.9933, 948.4929, 1302.6783, 1433.9921, 1663.4775, 1317.4982, 1525.8342, 1566.2703]
2025-05-06 09:48:47,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:48:47,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 7 minutes, 59 seconds)
2025-05-06 09:51:40,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:52:05,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1502.65161 ± 228.836
2025-05-06 09:52:05,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1474.0386, 1762.1198, 2078.7073, 1348.4661, 1404.3776, 1379.739, 1448.1389, 1311.8911, 1505.517, 1313.521]
2025-05-06 09:52:05,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:52:05,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 4 minutes, 47 seconds)
2025-05-06 09:54:58,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:55:23,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1413.90942 ± 103.929
2025-05-06 09:55:23,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1508.6947, 1544.2705, 1530.1448, 1307.8124, 1266.4259, 1280.9135, 1532.3235, 1384.409, 1423.5164, 1360.584]
2025-05-06 09:55:23,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:55:23,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 1 minute, 31 seconds)
2025-05-06 09:58:16,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:58:41,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1527.39893 ± 254.028
2025-05-06 09:58:41,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1742.465, 1474.572, 1886.8793, 1358.1906, 1256.1862, 1267.9685, 1291.3248, 1990.0905, 1387.9346, 1618.3773]
2025-05-06 09:58:41,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:58:41,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 58 minutes, 13 seconds)
2025-05-06 10:01:34,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:01:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1444.81616 ± 197.701
2025-05-06 10:01:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1451.2083, 1401.0314, 1199.6534, 1475.941, 1307.0895, 1456.6759, 1960.7891, 1402.1727, 1262.5272, 1531.0724]
2025-05-06 10:01:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:01:59,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 54 minutes, 55 seconds)
2025-05-06 10:04:52,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:05:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1428.65283 ± 358.603
2025-05-06 10:05:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1891.6559, 1532.109, 643.8888, 1207.4539, 1292.5186, 1721.6764, 1776.0718, 1173.5914, 1314.2002, 1733.363]
2025-05-06 10:05:17,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:05:17,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-05-06 10:08:10,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:08:35,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1655.72534 ± 404.716
2025-05-06 10:08:35,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1269.9518, 1341.8607, 2382.787, 1562.4486, 2108.347, 1428.677, 1335.5952, 2277.2783, 1454.7054, 1395.6036]
2025-05-06 10:08:35,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:08:35,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 48 minutes, 12 seconds)
2025-05-06 10:11:27,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:11:52,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1629.44958 ± 358.837
2025-05-06 10:11:52,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2280.938, 1206.6648, 1342.7521, 2203.5293, 1861.1472, 1602.3138, 1619.0131, 1267.6656, 1339.9683, 1570.5034]
2025-05-06 10:11:52,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:11:52,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 44 minutes, 47 seconds)
2025-05-06 10:14:44,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:15:10,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1360.79822 ± 375.879
2025-05-06 10:15:10,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1579.6487, 1404.5935, 808.82495, 519.0812, 1814.8018, 1630.5292, 1437.1971, 1575.1007, 1497.9617, 1340.2428]
2025-05-06 10:15:10,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:15:10,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 41 minutes, 22 seconds)
2025-05-06 10:18:02,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:18:27,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1507.72205 ± 156.247
2025-05-06 10:18:27,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1743.1006, 1362.8586, 1499.1423, 1436.6943, 1341.2156, 1537.1046, 1794.5784, 1423.8828, 1613.8411, 1324.8026]
2025-05-06 10:18:27,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:18:27,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 37 minutes, 58 seconds)
2025-05-06 10:21:19,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:21:44,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1795.86035 ± 316.073
2025-05-06 10:21:44,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1856.5944, 1788.8025, 1450.7576, 1507.0946, 1619.1653, 2267.845, 1459.5394, 1798.785, 1762.5626, 2447.4573]
2025-05-06 10:21:44,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:21:44,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1795.86) for latency SparseU15
2025-05-06 10:21:44,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 10:21:44,777 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 10:21:44,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 34 minutes, 38 seconds)
2025-05-06 10:24:37,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:25:02,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1629.97949 ± 215.978
2025-05-06 10:25:02,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1678.3313, 1677.7079, 1543.6898, 2034.7931, 1521.4563, 1417.2375, 1880.0758, 1789.4049, 1280.8401, 1476.2582]
2025-05-06 10:25:02,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:25:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 31 minutes, 20 seconds)
2025-05-06 10:27:55,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:28:20,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1752.32874 ± 382.130
2025-05-06 10:28:20,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1827.1836, 1354.8607, 2081.6448, 1675.5183, 1534.6735, 2236.349, 1306.3306, 1166.5651, 2100.8464, 2239.3147]
2025-05-06 10:28:20,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:28:20,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 28 minutes, 7 seconds)
2025-05-06 10:31:13,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:31:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1606.37708 ± 387.485
2025-05-06 10:31:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1588.0679, 1621.1359, 1789.9052, 1046.1659, 1254.1674, 1273.3795, 1892.1014, 2472.6611, 1766.322, 1359.8644]
2025-05-06 10:31:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:31:38,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 24 minutes, 58 seconds)
2025-05-06 10:34:32,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:34:57,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1539.13293 ± 422.061
2025-05-06 10:34:57,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1371.1761, 2030.862, 1270.1163, 697.39746, 1267.8746, 1474.7594, 1446.0372, 2261.55, 1769.8129, 1801.743]
2025-05-06 10:34:57,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:34:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 21 minutes, 52 seconds)
2025-05-06 10:37:50,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:38:15,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1451.83569 ± 482.420
2025-05-06 10:38:15,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1907.5251, 1334.134, 675.5438, 1573.3363, 1367.8662, 802.0188, 1614.3071, 1352.168, 2462.8767, 1428.5815]
2025-05-06 10:38:15,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:38:15,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 18 minutes, 42 seconds)
2025-05-06 10:41:12,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:41:37,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1518.55981 ± 161.773
2025-05-06 10:41:37,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1426.9697, 1785.7922, 1439.7404, 1470.9268, 1405.6118, 1854.2554, 1589.1626, 1356.6174, 1413.1735, 1443.349]
2025-05-06 10:41:37,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:41:37,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 15 minutes, 57 seconds)
2025-05-06 10:44:41,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:45:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1668.63940 ± 395.243
2025-05-06 10:45:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1432.7146, 1391.3284, 1342.7843, 1885.3445, 1382.3435, 1258.2255, 1965.3055, 1832.9781, 1576.6853, 2618.6838]
2025-05-06 10:45:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:45:06,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 14 minutes, 8 seconds)
2025-05-06 10:48:10,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:48:35,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1553.05469 ± 183.302
2025-05-06 10:48:35,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1324.2928, 1380.6675, 1403.2297, 1562.1384, 1685.3496, 1941.7305, 1755.0941, 1552.3993, 1516.404, 1409.2406]
2025-05-06 10:48:35,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:48:35,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 12 minutes, 11 seconds)
2025-05-06 10:51:39,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:52:04,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1746.24512 ± 343.456
2025-05-06 10:52:04,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1394.0403, 1634.8394, 1444.9734, 2395.6526, 1971.3524, 1660.407, 1673.0536, 2237.2761, 1779.036, 1271.8208]
2025-05-06 10:52:04,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:52:04,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 10 minutes, 10 seconds)
2025-05-06 10:55:08,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:55:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1592.39172 ± 288.711
2025-05-06 10:55:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1477.6418, 1414.0968, 1418.3027, 1961.4011, 1753.8682, 1362.2089, 1456.838, 2263.749, 1353.6703, 1462.1409]
2025-05-06 10:55:33,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:55:33,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 8 minutes, 2 seconds)
2025-05-06 10:58:37,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:59:02,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1822.13159 ± 507.382
2025-05-06 10:59:02,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1736.0233, 1670.4955, 1444.0149, 1755.6117, 1470.5261, 1343.5791, 1356.9619, 2977.8008, 2510.726, 1955.5767]
2025-05-06 10:59:02,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:59:02,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1822.13) for latency SparseU15
2025-05-06 10:59:02,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 10:59:02,767 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 10:59:02,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 5 minutes, 28 seconds)
2025-05-06 11:02:06,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:02:31,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1872.34399 ± 492.096
2025-05-06 11:02:31,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1424.6577, 1846.7195, 2928.9695, 2103.1648, 1873.9386, 1724.4708, 1238.4404, 1692.3352, 1404.9995, 2485.7424]
2025-05-06 11:02:31,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:02:31,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1872.34) for latency SparseU15
2025-05-06 11:02:31,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 11:02:31,921 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 11:02:31,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 1 minute, 59 seconds)
2025-05-06 11:05:36,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:06:01,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1663.25220 ± 350.587
2025-05-06 11:06:01,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1414.4196, 2633.0894, 1321.211, 1745.0153, 1522.1375, 1470.1472, 1624.9244, 1787.6915, 1501.3906, 1612.4944]
2025-05-06 11:06:01,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:06:01,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 58 minutes, 31 seconds)
2025-05-06 11:09:05,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:09:30,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1926.08923 ± 602.915
2025-05-06 11:09:30,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2357.0723, 1938.3169, 1598.5692, 3052.196, 1362.8766, 1368.1332, 1387.7295, 1388.965, 2856.8645, 1950.1703]
2025-05-06 11:09:30,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:09:30,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1926.09) for latency SparseU15
2025-05-06 11:09:30,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 11:09:30,637 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 11:09:30,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 55 minutes, 3 seconds)
2025-05-06 11:12:35,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:13:00,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1652.18774 ± 430.685
2025-05-06 11:13:00,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2629.9102, 1246.757, 1437.542, 2103.2832, 1446.044, 1572.1136, 1476.6273, 1280.7368, 1299.0428, 2029.8206]
2025-05-06 11:13:00,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:13:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 51 minutes, 38 seconds)
2025-05-06 11:16:04,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:16:29,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1726.84985 ± 344.644
2025-05-06 11:16:29,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1393.9457, 2083.805, 1350.9325, 2283.5369, 1414.4792, 1787.2004, 1595.0907, 2277.595, 1601.4822, 1480.4305]
2025-05-06 11:16:29,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:16:29,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 48 minutes, 8 seconds)
2025-05-06 11:19:34,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:19:59,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1823.84607 ± 418.822
2025-05-06 11:19:59,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1907.0927, 1559.5265, 1693.3291, 1646.8644, 1906.1749, 3012.007, 1777.4462, 1689.4849, 1454.1335, 1592.4031]
2025-05-06 11:19:59,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:19:59,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 44 minutes, 47 seconds)
2025-05-06 11:22:52,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:23:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1689.30078 ± 371.687
2025-05-06 11:23:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1464.0885, 2133.0332, 1413.7843, 2059.6147, 1498.1997, 1209.5833, 1312.0557, 2300.4692, 1495.4265, 2006.7529]
2025-05-06 11:23:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:23:17,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 40 minutes, 10 seconds)
2025-05-06 11:26:10,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:26:35,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1476.60864 ± 139.718
2025-05-06 11:26:35,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1333.5977, 1688.5017, 1408.0977, 1497.8226, 1576.5171, 1513.4182, 1249.0929, 1342.4705, 1471.171, 1685.397]
2025-05-06 11:26:35,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:26:35,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 35 minutes, 36 seconds)
2025-05-06 11:29:27,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:29:52,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1889.81348 ± 421.714
2025-05-06 11:29:52,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2689.9646, 1403.7175, 1621.5559, 1707.3702, 2536.9778, 1901.3263, 1640.3236, 1861.3375, 1390.3872, 2145.1743]
2025-05-06 11:29:52,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:29:52,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 31 minutes, 4 seconds)
2025-05-06 11:32:44,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:33:09,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1623.39856 ± 460.014
2025-05-06 11:33:09,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1415.2345, 1505.5138, 1583.9937, 1324.3914, 1651.5935, 1738.5394, 1511.5927, 2924.483, 1176.7208, 1401.9225]
2025-05-06 11:33:09,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:33:09,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 26 minutes, 41 seconds)
2025-05-06 11:36:01,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:36:26,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1861.68140 ± 364.629
2025-05-06 11:36:26,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2421.688, 1436.5258, 1734.1688, 1667.3066, 1834.9019, 2320.7256, 1966.8182, 1363.7742, 2322.6528, 1548.2512]
2025-05-06 11:36:26,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:36:26,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 22 minutes, 15 seconds)
2025-05-06 11:39:19,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:39:44,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1610.09534 ± 267.687
2025-05-06 11:39:44,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1407.2802, 1543.2194, 1563.9346, 1805.0549, 1356.8724, 1485.7833, 1429.8137, 2030.5685, 2132.022, 1346.4048]
2025-05-06 11:39:44,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:39:44,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 18 minutes, 56 seconds)
2025-05-06 11:42:36,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:43:01,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1627.33264 ± 419.039
2025-05-06 11:43:01,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1610.4517, 1225.9264, 2657.6648, 1622.0305, 1533.8285, 1344.5485, 1552.2155, 1339.1072, 1277.452, 2110.101]
2025-05-06 11:43:01,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:43:02,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 15 minutes, 40 seconds)
2025-05-06 11:45:54,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:46:19,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1642.84546 ± 240.059
2025-05-06 11:46:19,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1475.1727, 1803.1246, 1633.0602, 1728.4528, 1312.4012, 1414.0228, 1693.0305, 2194.9053, 1728.58, 1445.7054]
2025-05-06 11:46:19,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:46:19,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 12 minutes, 24 seconds)
2025-05-06 11:49:12,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:49:37,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1549.34875 ± 215.408
2025-05-06 11:49:37,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1416.3373, 1955.1849, 1421.5419, 1540.9886, 1627.7394, 1824.083, 1691.5732, 1331.4924, 1465.0189, 1219.5287]
2025-05-06 11:49:37,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:49:37,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 9 minutes, 8 seconds)
2025-05-06 11:52:29,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:52:54,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1608.79321 ± 190.602
2025-05-06 11:52:54,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1563.0381, 1592.1973, 1590.4176, 1440.8988, 1426.1338, 1479.3478, 1661.262, 1471.3247, 2093.1233, 1770.1882]
2025-05-06 11:52:54,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:52:54,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 5 minutes, 50 seconds)
2025-05-06 11:55:46,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:56:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1812.16675 ± 407.040
2025-05-06 11:56:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2564.0393, 1377.3322, 1911.4698, 2475.3389, 1807.7504, 1403.0946, 1444.3883, 1792.6921, 1908.263, 1437.2988]
2025-05-06 11:56:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:56:11,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 2 minutes, 31 seconds)
2025-05-06 11:59:03,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:59:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1850.19507 ± 403.156
2025-05-06 11:59:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1495.3407, 1346.1589, 2480.5115, 1718.6841, 1584.9061, 1934.9902, 1944.0776, 1822.4099, 1525.718, 2649.1553]
2025-05-06 11:59:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:59:28,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 83/100 (estimated time remaining: 59 minutes, 12 seconds)
2025-05-06 12:02:21,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:02:46,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1743.03088 ± 231.626
2025-05-06 12:02:46,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2254.7058, 1547.5096, 1579.8386, 1546.7539, 1567.8176, 1878.0273, 1933.8552, 1933.8398, 1543.3181, 1644.643]
2025-05-06 12:02:46,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:02:46,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 84/100 (estimated time remaining: 55 minutes, 54 seconds)
2025-05-06 12:05:38,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:06:03,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1969.68726 ± 346.874
2025-05-06 12:06:03,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1527.9592, 1861.0605, 2085.3213, 2223.0308, 2162.466, 2416.3196, 1415.1713, 1975.9873, 2445.1846, 1584.3728]
2025-05-06 12:06:03,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:06:03,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (1969.69) for latency SparseU15
2025-05-06 12:06:03,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:06:03,425 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:06:03,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 85/100 (estimated time remaining: 52 minutes, 35 seconds)
2025-05-06 12:08:55,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:09:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2136.35840 ± 547.037
2025-05-06 12:09:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2516.562, 3216.3975, 2454.4429, 2023.6725, 1410.7811, 2680.6724, 2063.1384, 1889.6995, 1536.6656, 1571.5527]
2025-05-06 12:09:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:09:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1124 [INFO]: New best (2136.36) for latency SparseU15
2025-05-06 12:09:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:09:20,701 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:09:20,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 86/100 (estimated time remaining: 49 minutes, 19 seconds)
2025-05-06 12:12:13,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:12:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1788.67456 ± 402.025
2025-05-06 12:12:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1532.6443, 1347.9258, 2493.6008, 1437.0977, 1817.1978, 1748.6858, 1868.4501, 1951.9125, 2433.7756, 1255.4559]
2025-05-06 12:12:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:12:38,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 87/100 (estimated time remaining: 46 minutes, 2 seconds)
2025-05-06 12:15:30,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:15:55,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1976.23706 ± 481.692
2025-05-06 12:15:55,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1888.5237, 1813.285, 2449.099, 1758.8099, 1434.0725, 2940.9783, 1435.8588, 1970.4009, 2531.3396, 1540.0044]
2025-05-06 12:15:55,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:15:55,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 88/100 (estimated time remaining: 42 minutes, 45 seconds)
2025-05-06 12:18:47,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:19:12,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1999.47400 ± 506.750
2025-05-06 12:19:12,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1936.9047, 1882.6678, 3081.8833, 2176.0654, 1451.1204, 1379.8911, 2163.967, 2562.851, 1440.8918, 1918.4955]
2025-05-06 12:19:12,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:19:12,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 89/100 (estimated time remaining: 39 minutes, 28 seconds)
2025-05-06 12:22:05,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:22:30,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1526.14001 ± 226.880
2025-05-06 12:22:30,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1736.4731, 1970.4221, 1435.4012, 1470.8362, 1860.4965, 1319.7747, 1428.7916, 1332.8296, 1333.5054, 1372.8694]
2025-05-06 12:22:30,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:22:30,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 90/100 (estimated time remaining: 36 minutes, 10 seconds)
2025-05-06 12:25:22,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:25:47,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1695.77673 ± 355.635
2025-05-06 12:25:47,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1942.8713, 1315.3336, 1667.616, 1430.7238, 1301.5536, 2465.117, 1911.167, 1992.3143, 1453.7421, 1477.3289]
2025-05-06 12:25:47,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:25:47,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 53 seconds)
2025-05-06 12:28:39,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:29:04,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1789.56860 ± 555.549
2025-05-06 12:29:04,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1943.5594, 1468.5033, 1387.6971, 1338.3069, 3277.144, 1388.1952, 1887.6726, 2101.1646, 1534.742, 1568.7001]
2025-05-06 12:29:04,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:29:04,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 35 seconds)
2025-05-06 12:31:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:32:21,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2045.48767 ± 695.604
2025-05-06 12:32:21,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1340.5002, 2111.398, 1938.2103, 1388.9794, 1353.0269, 3389.6836, 1286.6088, 2159.3809, 2893.521, 2593.5686]
2025-05-06 12:32:21,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:32:21,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 18 seconds)
2025-05-06 12:35:14,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:35:39,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1785.82654 ± 521.819
2025-05-06 12:35:39,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2308.6165, 1322.0104, 1479.0458, 1874.9545, 1388.2081, 1443.3264, 3062.644, 1349.6261, 1946.8677, 1682.967]
2025-05-06 12:35:39,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:35:39,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 1 second)
2025-05-06 12:38:32,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:38:57,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1875.08862 ± 734.182
2025-05-06 12:38:57,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1526.0513, 1941.4675, 1281.4895, 3356.0288, 1306.3582, 1763.736, 1437.3555, 3230.0342, 1474.0798, 1434.2858]
2025-05-06 12:38:57,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:38:57,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 44 seconds)
2025-05-06 12:41:50,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:42:15,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2002.20764 ± 468.131
2025-05-06 12:42:15,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2187.598, 1397.378, 2224.867, 1522.1763, 2648.5413, 2020.7544, 2796.9634, 1516.1595, 2154.8728, 1552.7659]
2025-05-06 12:42:15,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:42:15,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 27 seconds)
2025-05-06 12:45:08,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:45:33,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2025.16443 ± 400.541
2025-05-06 12:45:33,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2423.2615, 1778.4783, 1873.4962, 1490.1267, 1766.1746, 2005.1244, 2301.7686, 1539.2897, 2243.8835, 2830.04]
2025-05-06 12:45:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:45:33,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 11 seconds)
2025-05-06 12:48:26,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:48:51,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1957.70190 ± 675.265
2025-05-06 12:48:51,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2394.9753, 1415.1138, 3039.9343, 1465.7914, 1608.6486, 3322.2363, 1396.5938, 1780.706, 1434.313, 1718.7076]
2025-05-06 12:48:51,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:48:51,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 54 seconds)
2025-05-06 12:51:44,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:52:09,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2014.63989 ± 498.213
2025-05-06 12:52:09,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1468.4158, 2360.8096, 1253.2828, 2888.6477, 1914.0466, 2020.2654, 2236.9028, 2567.9658, 2016.9801, 1419.0812]
2025-05-06 12:52:09,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:52:09,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 36 seconds)
2025-05-06 12:55:02,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:55:27,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2011.34668 ± 609.691
2025-05-06 12:55:27,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2443.513, 1672.115, 1583.1649, 3325.4707, 1474.6477, 1682.3091, 2548.8315, 1514.897, 1404.3627, 2464.1545]
2025-05-06 12:55:27,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:55:27,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1097 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 18 seconds)
2025-05-06 12:58:19,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:58:44,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1761.33044 ± 505.368
2025-05-06 12:58:44,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1717.7155, 1390.1217, 1371.8892, 1973.7244, 1248.4342, 3119.489, 1618.9255, 1513.6958, 1926.8674, 1732.4415]
2025-05-06 12:58:44,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:58:44,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1149 [DEBUG]: Training session finished
