2025-05-09 19:39:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16
2025-05-09 19:39:10,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16
2025-05-09 19:39:10,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7848fc1c5c70>}
2025-05-09 19:39:10,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1111 [DEBUG]: using device: cpu
2025-05-09 19:39:10,258 baseline-bpql-noisy-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-09 19:39:10,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-09 19:39:10,264 baseline-bpql-noisy-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-09 19:39:10,264 baseline-bpql-noisy-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 19:39:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-09 19:39:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-09 19:41:35,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:41:35,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 52.51504 ± 0.977
2025-05-09 19:41:35,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [51.824352, 51.91338, 52.79451, 52.467377, 53.64664, 51.837166, 54.682796, 51.388134, 51.653397, 52.94264]
2025-05-09 19:41:35,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 37.0, 34.0, 35.0, 36.0, 36.0, 35.0, 35.0, 35.0, 38.0]
2025-05-09 19:41:35,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (52.52) for latency MM1Queue_a033_s075
2025-05-09 19:41:35,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 19:41:35,593 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:41:35,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 59 minutes, 28 seconds)
2025-05-09 19:44:13,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:44:16,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 201.61186 ± 105.248
2025-05-09 19:44:16,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [357.83572, 286.18103, 157.46158, 77.177055, 85.98568, 96.36135, 315.59277, 249.35918, 90.25645, 299.9078]
2025-05-09 19:44:16,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [300.0, 328.0, 161.0, 80.0, 105.0, 103.0, 298.0, 250.0, 103.0, 292.0]
2025-05-09 19:44:16,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (201.61) for latency MM1Queue_a033_s075
2025-05-09 19:44:16,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 19:44:16,559 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:44:16,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 9 minutes, 59 seconds)
2025-05-09 19:46:50,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:46:52,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 199.98276 ± 89.520
2025-05-09 19:46:52,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [254.07433, 112.95037, 101.110466, 255.11446, 113.38377, 294.7081, 298.31952, 115.41613, 122.42953, 332.3208]
2025-05-09 19:46:52,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 72.0, 73.0, 135.0, 78.0, 167.0, 159.0, 74.0, 85.0, 162.0]
2025-05-09 19:46:52,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 8 minutes, 44 seconds)
2025-05-09 19:49:26,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:49:28,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 207.35005 ± 131.807
2025-05-09 19:49:28,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [357.1299, 458.05362, 295.58856, 97.65843, 150.6853, 35.708916, 83.324455, 304.3191, 187.486, 103.546295]
2025-05-09 19:49:28,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 285.0, 151.0, 64.0, 94.0, 34.0, 60.0, 159.0, 149.0, 72.0]
2025-05-09 19:49:28,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (207.35) for latency MM1Queue_a033_s075
2025-05-09 19:49:28,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 19:49:28,517 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:49:28,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 7 minutes, 13 seconds)
2025-05-09 19:52:02,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:52:21,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 985.84100 ± 3.616
2025-05-09 19:52:21,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [988.2334, 984.72864, 986.06934, 987.8122, 984.2981, 988.2887, 990.3913, 977.304, 982.65735, 988.6271]
2025-05-09 19:52:21,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:52:21,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (985.84) for latency MM1Queue_a033_s075
2025-05-09 19:52:21,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 19:52:21,635 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:52:21,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 10 minutes, 32 seconds)
2025-05-09 19:55:05,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:55:16,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 582.74963 ± 318.034
2025-05-09 19:55:16,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [970.4342, 461.4725, 931.79315, 402.04166, 300.31845, 963.5946, 216.89624, 973.7494, 192.75113, 414.44513]
2025-05-09 19:55:16,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 483.0, 1000.0, 396.0, 308.0, 1000.0, 233.0, 1000.0, 203.0, 428.0]
2025-05-09 19:55:16,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 17 minutes, 7 seconds)
2025-05-09 19:57:46,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:57:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 300.61066 ± 48.070
2025-05-09 19:57:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [186.61603, 316.56525, 331.07523, 313.43106, 313.7289, 297.51746, 239.92522, 363.53226, 317.4341, 326.28104]
2025-05-09 19:57:48,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 145.0, 147.0, 151.0, 143.0, 137.0, 116.0, 177.0, 145.0, 149.0]
2025-05-09 19:57:48,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 11 minutes, 47 seconds)
2025-05-09 20:00:25,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:00:27,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 223.21561 ± 44.116
2025-05-09 20:00:27,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [251.96478, 252.97212, 157.0005, 235.13414, 152.42467, 267.16046, 163.91182, 255.07489, 263.98947, 232.52321]
2025-05-09 20:00:27,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 127.0, 87.0, 120.0, 94.0, 124.0, 88.0, 122.0, 130.0, 111.0]
2025-05-09 20:00:27,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 10 minutes, 6 seconds)
2025-05-09 20:03:04,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:03:08,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 334.91074 ± 119.093
2025-05-09 20:03:08,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [135.34503, 503.56442, 286.98148, 463.62732, 143.2525, 313.6313, 324.93045, 350.00952, 365.35358, 462.41208]
2025-05-09 20:03:08,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 318.0, 237.0, 290.0, 90.0, 176.0, 166.0, 185.0, 200.0, 293.0]
2025-05-09 20:03:08,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 8 minutes, 34 seconds)
2025-05-09 20:05:42,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:05:45,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 442.29077 ± 63.728
2025-05-09 20:05:45,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [501.41812, 539.6306, 378.0052, 521.072, 392.12918, 497.8265, 363.3095, 372.6383, 417.7397, 439.1382]
2025-05-09 20:05:45,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 217.0, 174.0, 211.0, 204.0, 189.0, 174.0, 159.0, 173.0, 248.0]
2025-05-09 20:05:45,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 1 minute, 13 seconds)
2025-05-09 20:08:21,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:08:24,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 445.86597 ± 119.814
2025-05-09 20:08:24,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [502.31732, 592.6955, 493.14215, 178.19518, 450.72244, 464.07968, 373.91437, 445.9916, 615.22797, 342.37366]
2025-05-09 20:08:24,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 243.0, 200.0, 107.0, 204.0, 205.0, 177.0, 230.0, 257.0, 161.0]
2025-05-09 20:08:24,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 53 minutes, 51 seconds)
2025-05-09 20:11:01,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:11:04,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 441.79181 ± 234.669
2025-05-09 20:11:04,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [540.27325, 349.61887, 228.95302, 599.71735, 484.6026, 475.11404, 268.81564, 252.79617, 1019.9577, 198.06956]
2025-05-09 20:11:04,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 152.0, 119.0, 321.0, 187.0, 202.0, 146.0, 126.0, 593.0, 104.0]
2025-05-09 20:11:04,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 53 minutes, 26 seconds)
2025-05-09 20:13:42,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:13:45,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 462.15289 ± 154.762
2025-05-09 20:13:45,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [539.1639, 304.9627, 533.7398, 506.8075, 483.87817, 500.97086, 88.344086, 653.3837, 405.69977, 604.5785]
2025-05-09 20:13:45,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 150.0, 189.0, 190.0, 204.0, 205.0, 55.0, 227.0, 220.0, 216.0]
2025-05-09 20:13:45,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 51 minutes, 24 seconds)
2025-05-09 20:16:21,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:16:26,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 779.33466 ± 382.809
2025-05-09 20:16:26,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [931.65985, 742.9308, 829.229, 1422.521, 312.6114, 365.07898, 120.11821, 924.9353, 1008.6722, 1135.59]
2025-05-09 20:16:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [370.0, 258.0, 311.0, 523.0, 160.0, 209.0, 79.0, 340.0, 399.0, 434.0]
2025-05-09 20:16:26,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 48 minutes, 49 seconds)
2025-05-09 20:19:05,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:19:08,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 532.76367 ± 211.187
2025-05-09 20:19:08,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [581.8194, 525.1901, 794.0511, 819.9353, 723.5091, 527.28973, 483.6583, 156.69469, 197.61775, 517.87146]
2025-05-09 20:19:08,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [219.0, 213.0, 296.0, 313.0, 263.0, 190.0, 183.0, 92.0, 114.0, 193.0]
2025-05-09 20:19:08,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 47 minutes, 35 seconds)
2025-05-09 20:21:45,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:21:49,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 603.92584 ± 306.221
2025-05-09 20:21:49,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [549.3517, 509.90723, 797.3424, 207.95, 641.4894, 379.36835, 1152.0555, 122.35036, 705.0192, 974.42444]
2025-05-09 20:21:49,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 252.0, 266.0, 110.0, 259.0, 181.0, 461.0, 74.0, 317.0, 410.0]
2025-05-09 20:21:49,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 45 minutes, 26 seconds)
2025-05-09 20:24:24,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:24:28,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 484.16391 ± 274.312
2025-05-09 20:24:28,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [380.02808, 308.9667, 846.0094, 992.03503, 430.80426, 691.56537, 84.44887, 251.12259, 595.4947, 261.16394]
2025-05-09 20:24:28,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 151.0, 323.0, 364.0, 213.0, 324.0, 59.0, 129.0, 245.0, 142.0]
2025-05-09 20:24:28,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 42 minutes, 20 seconds)
2025-05-09 20:27:06,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:27:09,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 421.67197 ± 251.404
2025-05-09 20:27:09,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [302.7439, 137.82574, 291.77127, 956.1261, 379.86362, 701.9635, 260.3622, 375.3794, 656.1439, 154.54028]
2025-05-09 20:27:09,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 83.0, 134.0, 357.0, 178.0, 282.0, 135.0, 175.0, 270.0, 87.0]
2025-05-09 20:27:09,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 39 minutes, 39 seconds)
2025-05-09 20:29:49,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:29:54,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 711.59363 ± 435.661
2025-05-09 20:29:54,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [476.6466, 523.48157, 1332.886, 1367.8086, 1195.8389, 452.22644, 320.52066, 956.05554, 389.03143, 101.44025]
2025-05-09 20:29:54,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 222.0, 478.0, 591.0, 538.0, 225.0, 165.0, 409.0, 204.0, 66.0]
2025-05-09 20:29:54,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 38 minutes, 12 seconds)
2025-05-09 20:32:29,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:32:34,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 708.38159 ± 374.980
2025-05-09 20:32:34,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [836.2773, 580.6192, 1158.4445, 1479.8608, 913.9654, 659.4286, 301.92944, 295.38516, 289.1408, 568.765]
2025-05-09 20:32:34,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [377.0, 250.0, 436.0, 591.0, 351.0, 262.0, 156.0, 162.0, 158.0, 245.0]
2025-05-09 20:32:34,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 34 minutes, 53 seconds)
2025-05-09 20:35:12,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:35:15,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 462.60117 ± 252.003
2025-05-09 20:35:15,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [272.3109, 676.79956, 170.4475, 309.64835, 140.7012, 814.3804, 628.1222, 875.5292, 342.27698, 395.7956]
2025-05-09 20:35:15,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 267.0, 101.0, 152.0, 76.0, 329.0, 250.0, 312.0, 175.0, 185.0]
2025-05-09 20:35:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 32 minutes, 19 seconds)
2025-05-09 20:37:57,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:38:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 568.95398 ± 273.981
2025-05-09 20:38:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [672.6551, 726.4756, 684.12836, 680.6793, 1056.1414, 187.91699, 183.6282, 645.24615, 662.78973, 189.87883]
2025-05-09 20:38:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [265.0, 268.0, 260.0, 276.0, 338.0, 104.0, 103.0, 262.0, 248.0, 101.0]
2025-05-09 20:38:01,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 31 minutes, 23 seconds)
2025-05-09 20:40:35,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:40:40,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 798.94415 ± 377.929
2025-05-09 20:40:40,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [999.91644, 267.10928, 678.16925, 1220.4772, 533.7144, 1366.328, 1306.6014, 339.38077, 652.17084, 625.573]
2025-05-09 20:40:40,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [383.0, 161.0, 269.0, 506.0, 217.0, 520.0, 511.0, 148.0, 244.0, 282.0]
2025-05-09 20:40:40,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 28 minutes, 21 seconds)
2025-05-09 20:43:25,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:43:31,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 916.19031 ± 350.386
2025-05-09 20:43:31,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1324.5284, 442.95416, 1253.4026, 704.356, 879.4431, 886.6384, 887.4939, 349.18713, 922.8644, 1511.0354]
2025-05-09 20:43:31,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [524.0, 193.0, 450.0, 309.0, 317.0, 357.0, 347.0, 155.0, 332.0, 584.0]
2025-05-09 20:43:31,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 26 minutes, 58 seconds)
2025-05-09 20:46:11,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:46:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 635.57532 ± 409.962
2025-05-09 20:46:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [90.52021, 1171.6896, 628.69867, 129.43227, 1133.9125, 257.04642, 1059.8804, 267.92276, 999.9591, 616.69135]
2025-05-09 20:46:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 391.0, 247.0, 76.0, 362.0, 138.0, 386.0, 144.0, 395.0, 247.0]
2025-05-09 20:46:15,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 25 minutes, 7 seconds)
2025-05-09 20:48:54,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:48:58,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 644.26941 ± 457.364
2025-05-09 20:48:58,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [819.719, 217.04037, 139.85915, 698.85376, 499.75873, 672.03613, 599.9513, 193.68346, 785.07837, 1816.714]
2025-05-09 20:48:58,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [343.0, 126.0, 87.0, 306.0, 248.0, 317.0, 258.0, 105.0, 264.0, 724.0]
2025-05-09 20:48:58,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 22 minutes, 55 seconds)
2025-05-09 20:51:34,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:51:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1323.83423 ± 673.319
2025-05-09 20:51:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1213.0396, 877.31396, 699.4664, 869.3412, 720.3793, 771.19275, 1691.3157, 2426.007, 1340.3678, 2629.9192]
2025-05-09 20:51:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [494.0, 335.0, 256.0, 335.0, 268.0, 315.0, 611.0, 873.0, 505.0, 1000.0]
2025-05-09 20:51:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (1323.83) for latency MM1Queue_a033_s075
2025-05-09 20:51:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 20:51:43,251 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 20:51:43,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 20 minutes)
2025-05-09 20:54:23,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:54:32,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1312.18323 ± 947.987
2025-05-09 20:54:32,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2604.538, 365.017, 299.71545, 134.52469, 1003.9921, 2367.574, 1529.1403, 1658.8264, 463.7261, 2694.7776]
2025-05-09 20:54:32,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 167.0, 134.0, 84.0, 372.0, 907.0, 590.0, 639.0, 203.0, 1000.0]
2025-05-09 20:54:32,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 19 minutes, 36 seconds)
2025-05-09 20:57:22,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:57:28,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 907.28400 ± 946.818
2025-05-09 20:57:28,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [370.7972, 137.32645, 510.56424, 1025.8528, 89.15383, 1111.5193, 2877.6973, 148.79195, 2462.8772, 338.25928]
2025-05-09 20:57:28,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 83.0, 203.0, 313.0, 57.0, 397.0, 1000.0, 86.0, 817.0, 145.0]
2025-05-09 20:57:28,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 18 minutes)
2025-05-09 21:00:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:00:09,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1163.15991 ± 828.844
2025-05-09 21:00:09,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1907.358, 2155.7124, 152.87273, 448.1095, 2595.3538, 201.14075, 443.55832, 1657.6349, 1151.0863, 918.77264]
2025-05-09 21:00:09,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [643.0, 722.0, 88.0, 195.0, 889.0, 106.0, 195.0, 595.0, 427.0, 309.0]
2025-05-09 21:00:09,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 14 minutes, 34 seconds)
2025-05-09 21:02:42,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:02:47,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 841.33557 ± 331.531
2025-05-09 21:02:47,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [999.81665, 231.86949, 685.8714, 1204.5723, 832.13086, 943.58246, 905.2492, 327.3953, 1350.6935, 932.1743]
2025-05-09 21:02:47,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [385.0, 116.0, 256.0, 482.0, 262.0, 315.0, 293.0, 147.0, 523.0, 376.0]
2025-05-09 21:02:47,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 10 minutes, 45 seconds)
2025-05-09 21:05:25,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:05:32,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1174.50952 ± 485.030
2025-05-09 21:05:32,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1911.9355, 780.99274, 526.7005, 805.93976, 1828.5347, 1562.3572, 933.3791, 914.9449, 800.794, 1679.5175]
2025-05-09 21:05:32,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [666.0, 304.0, 202.0, 305.0, 633.0, 506.0, 358.0, 334.0, 290.0, 542.0]
2025-05-09 21:05:32,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 7 minutes, 55 seconds)
2025-05-09 21:08:15,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:08:24,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1471.84131 ± 816.953
2025-05-09 21:08:24,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1367.3132, 1442.1158, 2567.5632, 2862.3354, 1558.7391, 1264.3281, 597.0429, 2153.1082, 746.08716, 159.77977]
2025-05-09 21:08:24,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [490.0, 487.0, 863.0, 1000.0, 564.0, 437.0, 226.0, 776.0, 290.0, 96.0]
2025-05-09 21:08:24,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (1471.84) for latency MM1Queue_a033_s075
2025-05-09 21:08:24,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 21:08:24,717 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:08:24,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 5 minutes, 49 seconds)
2025-05-09 21:11:01,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:11:09,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1352.20081 ± 651.525
2025-05-09 21:11:09,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1187.8868, 1426.9303, 402.1574, 1110.2931, 880.3272, 2262.9568, 903.16583, 1289.6597, 1295.6592, 2762.9717]
2025-05-09 21:11:09,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 448.0, 184.0, 419.0, 306.0, 748.0, 307.0, 451.0, 421.0, 1000.0]
2025-05-09 21:11:09,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 42 seconds)
2025-05-09 21:13:49,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:13:54,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 919.98816 ± 568.307
2025-05-09 21:13:54,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [150.22838, 1425.7208, 900.8482, 888.96533, 169.40532, 1266.3026, 1124.6783, 347.78644, 2070.3535, 855.59235]
2025-05-09 21:13:54,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 444.0, 272.0, 279.0, 95.0, 393.0, 343.0, 158.0, 711.0, 270.0]
2025-05-09 21:13:54,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 58 minutes, 48 seconds)
2025-05-09 21:16:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:16:50,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1759.11255 ± 703.071
2025-05-09 21:16:50,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1267.872, 1512.0016, 1661.1471, 2769.9214, 2489.0447, 1871.9998, 1675.1046, 2604.2522, 253.01051, 1486.7706]
2025-05-09 21:16:50,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [448.0, 550.0, 571.0, 975.0, 868.0, 716.0, 581.0, 887.0, 120.0, 532.0]
2025-05-09 21:16:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (1759.11) for latency MM1Queue_a033_s075
2025-05-09 21:16:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 21:16:50,297 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:16:50,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 59 minutes, 42 seconds)
2025-05-09 21:19:24,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:19:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2089.46143 ± 992.956
2025-05-09 21:19:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2809.1572, 2350.033, 3000.0232, 2952.6028, 2930.2097, 1712.1818, 239.64339, 891.54517, 1009.82635, 2999.3926]
2025-05-09 21:19:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 794.0, 1000.0, 1000.0, 1000.0, 557.0, 121.0, 338.0, 318.0, 992.0]
2025-05-09 21:19:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (2089.46) for latency MM1Queue_a033_s075
2025-05-09 21:19:37,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 21:19:37,031 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:19:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 57 minutes, 23 seconds)
2025-05-09 21:22:14,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:22:23,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1449.76074 ± 1058.929
2025-05-09 21:22:23,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2959.352, 731.8081, 1090.449, 2990.5513, 965.91284, 358.74792, 3000.1152, 1044.0173, 72.941185, 1283.7128]
2025-05-09 21:22:23,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [992.0, 263.0, 380.0, 1000.0, 299.0, 157.0, 1000.0, 376.0, 47.0, 421.0]
2025-05-09 21:22:23,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 53 minutes, 21 seconds)
2025-05-09 21:25:11,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:25:21,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1769.42346 ± 821.652
2025-05-09 21:25:21,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1031.4622, 2797.7876, 1283.9795, 2161.028, 2289.0713, 2571.1824, 2666.8994, 1162.6218, 1585.475, 144.72713]
2025-05-09 21:25:21,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 992.0, 443.0, 714.0, 809.0, 874.0, 895.0, 388.0, 525.0, 84.0]
2025-05-09 21:25:21,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 53 minutes, 16 seconds)
2025-05-09 21:28:00,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:28:08,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1299.21631 ± 861.735
2025-05-09 21:28:08,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [197.75133, 1390.8457, 434.7159, 345.4563, 1629.2827, 2626.9128, 2820.126, 1543.862, 751.2653, 1251.9446]
2025-05-09 21:28:08,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 457.0, 175.0, 148.0, 513.0, 938.0, 965.0, 538.0, 276.0, 480.0]
2025-05-09 21:28:08,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 50 minutes, 47 seconds)
2025-05-09 21:30:41,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:30:51,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1732.78125 ± 696.750
2025-05-09 21:30:51,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1171.6748, 2797.5796, 2841.0874, 1024.49, 2125.8496, 1871.4376, 1068.2753, 1039.4197, 1156.9282, 2231.0713]
2025-05-09 21:30:51,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [369.0, 1000.0, 1000.0, 386.0, 729.0, 615.0, 394.0, 330.0, 401.0, 803.0]
2025-05-09 21:30:51,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 45 minutes, 28 seconds)
2025-05-09 21:33:43,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:33:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1277.62036 ± 1138.482
2025-05-09 21:33:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [133.6877, 597.62555, 362.48218, 3011.6719, 2829.2224, 441.4385, 1068.2411, 1124.1168, 3010.8206, 196.89658]
2025-05-09 21:33:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 214.0, 152.0, 1000.0, 934.0, 194.0, 336.0, 358.0, 1000.0, 97.0]
2025-05-09 21:33:51,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 45 minutes, 10 seconds)
2025-05-09 21:36:20,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:36:26,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1149.07544 ± 405.586
2025-05-09 21:36:26,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1113.8448, 798.9303, 2027.7693, 1422.6462, 899.0135, 619.3284, 1397.4065, 1134.7703, 1388.3344, 688.7114]
2025-05-09 21:36:26,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [338.0, 294.0, 645.0, 491.0, 330.0, 234.0, 436.0, 369.0, 436.0, 262.0]
2025-05-09 21:36:26,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 40 minutes, 8 seconds)
2025-05-09 21:39:07,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:39:19,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1976.62305 ± 656.820
2025-05-09 21:39:19,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2144.2356, 2040.8337, 2159.1304, 1618.4454, 1677.2072, 720.6137, 3016.1929, 3023.635, 1455.2623, 1910.6754]
2025-05-09 21:39:19,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [674.0, 672.0, 739.0, 479.0, 517.0, 277.0, 1000.0, 1000.0, 441.0, 668.0]
2025-05-09 21:39:19,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 36 minutes, 19 seconds)
2025-05-09 21:42:00,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:42:08,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1333.14978 ± 531.510
2025-05-09 21:42:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1552.5975, 2185.6702, 845.77435, 1533.337, 1571.1106, 125.55722, 1269.0082, 1141.0865, 1797.907, 1309.4487]
2025-05-09 21:42:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [550.0, 690.0, 321.0, 474.0, 510.0, 72.0, 467.0, 342.0, 620.0, 416.0]
2025-05-09 21:42:08,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 33 minutes, 57 seconds)
2025-05-09 21:44:47,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:44:54,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1195.00293 ± 1034.926
2025-05-09 21:44:54,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [461.45724, 437.52612, 899.40717, 450.1684, 2839.95, 453.78415, 900.03864, 150.94206, 2429.7954, 2926.9597]
2025-05-09 21:44:54,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 180.0, 276.0, 185.0, 1000.0, 185.0, 282.0, 86.0, 848.0, 1000.0]
2025-05-09 21:44:54,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 31 minutes, 42 seconds)
2025-05-09 21:47:36,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:47:47,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1840.01782 ± 979.931
2025-05-09 21:47:47,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2931.129, 2786.9468, 2954.708, 330.1497, 1550.2837, 1723.849, 523.3859, 2954.8684, 1760.3606, 884.4953]
2025-05-09 21:47:47,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 910.0, 1000.0, 141.0, 560.0, 560.0, 207.0, 1000.0, 624.0, 299.0]
2025-05-09 21:47:47,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 27 minutes, 43 seconds)
2025-05-09 21:50:19,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:50:24,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 952.80334 ± 573.454
2025-05-09 21:50:24,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [915.44324, 1168.8755, 859.06775, 1039.9285, 1584.0371, 403.01715, 467.10815, 2259.307, 456.52505, 374.7241]
2025-05-09 21:50:24,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [285.0, 371.0, 295.0, 320.0, 482.0, 178.0, 170.0, 720.0, 175.0, 160.0]
2025-05-09 21:50:24,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 25 minutes, 16 seconds)
2025-05-09 21:53:18,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:53:29,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1966.73376 ± 794.782
2025-05-09 21:53:29,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [808.743, 1150.6279, 2974.2612, 2103.7827, 2990.9636, 3002.577, 2208.3726, 1171.8643, 1285.7883, 1970.3568]
2025-05-09 21:53:29,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [275.0, 374.0, 1000.0, 729.0, 1000.0, 1000.0, 714.0, 364.0, 399.0, 633.0]
2025-05-09 21:53:29,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 24 minutes, 38 seconds)
2025-05-09 21:56:04,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:56:08,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 772.93750 ± 455.642
2025-05-09 21:56:08,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [202.9318, 260.71826, 1103.2194, 1115.827, 442.54788, 1101.5184, 1522.0878, 739.7197, 153.4058, 1087.399]
2025-05-09 21:56:08,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 125.0, 355.0, 349.0, 185.0, 348.0, 475.0, 264.0, 95.0, 346.0]
2025-05-09 21:56:08,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 19 minutes, 59 seconds)
2025-05-09 21:58:43,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:58:52,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1527.09802 ± 753.494
2025-05-09 21:58:52,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1751.7311, 382.98108, 616.07855, 2213.1057, 1312.0148, 863.63776, 1675.8145, 3034.3389, 2002.01, 1419.2682]
2025-05-09 21:58:52,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [549.0, 168.0, 238.0, 734.0, 430.0, 317.0, 568.0, 1000.0, 625.0, 461.0]
2025-05-09 21:58:52,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 16 minutes, 49 seconds)
2025-05-09 22:01:29,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:01:36,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1243.19470 ± 698.846
2025-05-09 22:01:36,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [964.5093, 1347.6885, 823.51227, 1980.4506, 1171.82, 152.89903, 1590.2091, 1827.525, 187.55028, 2385.7847]
2025-05-09 22:01:36,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [304.0, 416.0, 276.0, 628.0, 361.0, 84.0, 501.0, 581.0, 99.0, 757.0]
2025-05-09 22:01:36,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 12 minutes, 36 seconds)
2025-05-09 22:04:12,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:04:18,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1010.33972 ± 446.311
2025-05-09 22:04:18,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1522.4323, 838.78326, 1096.3348, 1099.1787, 792.71545, 724.02277, 586.23804, 1458.6464, 1758.7363, 226.30989]
2025-05-09 22:04:18,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [467.0, 269.0, 339.0, 362.0, 289.0, 268.0, 223.0, 452.0, 547.0, 113.0]
2025-05-09 22:04:18,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 10 minutes, 37 seconds)
2025-05-09 22:07:09,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:07:16,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1220.82104 ± 757.133
2025-05-09 22:07:16,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1811.9777, 2946.6882, 1318.3138, 315.399, 1354.4243, 130.50679, 1507.575, 1113.9574, 796.6677, 912.7009]
2025-05-09 22:07:16,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [617.0, 1000.0, 409.0, 132.0, 424.0, 80.0, 531.0, 358.0, 243.0, 291.0]
2025-05-09 22:07:16,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 6 minutes, 45 seconds)
2025-05-09 22:09:54,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:10:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1119.39978 ± 546.166
2025-05-09 22:10:00,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1060.8903, 1181.4784, 1072.5104, 1382.2847, 1598.5812, 1899.4983, 140.91077, 144.22609, 1206.6558, 1506.9633]
2025-05-09 22:10:00,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [333.0, 377.0, 342.0, 444.0, 498.0, 601.0, 79.0, 80.0, 375.0, 468.0]
2025-05-09 22:10:00,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 4 minutes, 46 seconds)
2025-05-09 22:12:32,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:12:37,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1086.71594 ± 428.149
2025-05-09 22:12:37,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [838.58734, 1310.3727, 1099.9412, 1193.2434, 1342.5828, 352.55667, 1107.3567, 1354.2246, 1851.4532, 416.8408]
2025-05-09 22:12:37,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [306.0, 399.0, 351.0, 369.0, 423.0, 155.0, 344.0, 428.0, 571.0, 182.0]
2025-05-09 22:12:37,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 1 minute, 6 seconds)
2025-05-09 22:15:19,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:15:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1702.52014 ± 821.437
2025-05-09 22:15:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2119.972, 2803.8267, 1012.86816, 1766.1599, 1645.274, 1160.719, 1065.8779, 239.32227, 3056.484, 2154.698]
2025-05-09 22:15:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [686.0, 891.0, 317.0, 562.0, 570.0, 360.0, 373.0, 120.0, 1000.0, 680.0]
2025-05-09 22:15:29,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 59 minutes, 20 seconds)
2025-05-09 22:18:02,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:18:11,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1557.57104 ± 814.347
2025-05-09 22:18:11,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1597.8263, 565.7168, 3007.7913, 765.48517, 2028.1666, 1491.1641, 1177.8417, 814.964, 2924.3262, 1202.4297]
2025-05-09 22:18:11,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [536.0, 218.0, 1000.0, 287.0, 689.0, 525.0, 360.0, 304.0, 1000.0, 369.0]
2025-05-09 22:18:11,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 56 minutes, 42 seconds)
2025-05-09 22:20:58,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:21:10,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2045.15076 ± 1043.329
2025-05-09 22:21:10,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2992.2727, 2954.8845, 2965.8901, 1527.4795, 343.1737, 2004.7454, 354.0687, 1333.1411, 2954.5808, 3021.2705]
2025-05-09 22:21:10,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 535.0, 160.0, 612.0, 164.0, 443.0, 1000.0, 1000.0]
2025-05-09 22:21:10,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-05-09 22:23:48,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:24:02,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2314.04443 ± 734.779
2025-05-09 22:24:02,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3142.2458, 2192.2725, 2996.8757, 1975.0988, 1450.003, 3026.0884, 1437.024, 3142.6567, 1165.6963, 2612.4836]
2025-05-09 22:24:02,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 732.0, 1000.0, 603.0, 501.0, 1000.0, 504.0, 999.0, 409.0, 818.0]
2025-05-09 22:24:02,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (2314.04) for latency MM1Queue_a033_s075
2025-05-09 22:24:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 22:24:02,277 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:24:02,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 52 minutes, 17 seconds)
2025-05-09 22:26:54,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:27:05,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1836.91248 ± 1057.116
2025-05-09 22:27:05,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2959.6125, 160.43808, 2998.6746, 1618.0312, 1161.5717, 637.35895, 2941.2334, 2203.5134, 735.9046, 2952.7856]
2025-05-09 22:27:05,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 94.0, 1000.0, 571.0, 424.0, 248.0, 1000.0, 700.0, 270.0, 1000.0]
2025-05-09 22:27:05,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 52 minutes, 44 seconds)
2025-05-09 22:29:34,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:29:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1893.86938 ± 928.078
2025-05-09 22:29:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1281.5271, 1640.5126, 3002.2603, 1382.6273, 3079.086, 619.67596, 3033.7634, 1890.9011, 491.04837, 2517.2913]
2025-05-09 22:29:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [454.0, 525.0, 1000.0, 425.0, 1000.0, 241.0, 1000.0, 602.0, 201.0, 786.0]
2025-05-09 22:29:45,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 48 minutes, 28 seconds)
2025-05-09 22:32:34,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:32:46,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2007.19043 ± 987.157
2025-05-09 22:32:46,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1593.7732, 3005.6655, 3088.705, 1576.4875, 573.40735, 2978.871, 1547.6025, 2388.549, 313.44583, 3005.3984]
2025-05-09 22:32:46,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [504.0, 1000.0, 1000.0, 485.0, 213.0, 1000.0, 539.0, 741.0, 144.0, 1000.0]
2025-05-09 22:32:46,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 47 minutes, 48 seconds)
2025-05-09 22:35:27,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:35:37,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1656.10474 ± 990.422
2025-05-09 22:35:37,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [526.0119, 2988.7761, 2940.265, 2939.749, 1768.3677, 2058.039, 352.95712, 1280.6534, 1132.8689, 573.35834]
2025-05-09 22:35:37,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 1000.0, 1000.0, 1000.0, 603.0, 702.0, 148.0, 448.0, 355.0, 214.0]
2025-05-09 22:35:37,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 44 minutes, 2 seconds)
2025-05-09 22:38:13,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:38:23,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1796.76233 ± 1034.993
2025-05-09 22:38:23,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2645.1978, 1624.0922, 99.909065, 638.466, 2858.8716, 821.86334, 3066.556, 1158.3551, 1990.4548, 3063.8574]
2025-05-09 22:38:23,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [818.0, 497.0, 68.0, 232.0, 942.0, 287.0, 1000.0, 362.0, 672.0, 1000.0]
2025-05-09 22:38:23,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 40 minutes, 27 seconds)
2025-05-09 22:41:10,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:41:17,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1241.53711 ± 1120.110
2025-05-09 22:41:17,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [565.34235, 3101.5286, 154.95335, 120.33595, 3025.1365, 1378.5135, 374.68277, 821.32794, 439.8661, 2433.685]
2025-05-09 22:41:17,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [219.0, 1000.0, 89.0, 78.0, 946.0, 462.0, 164.0, 284.0, 187.0, 756.0]
2025-05-09 22:41:17,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 36 minutes, 37 seconds)
2025-05-09 22:43:57,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:44:08,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1897.71191 ± 1002.626
2025-05-09 22:44:08,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3056.1636, 1404.5675, 834.2092, 1496.9585, 647.4104, 3071.4468, 2725.7537, 3053.1582, 2238.2432, 449.20938]
2025-05-09 22:44:08,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 481.0, 286.0, 506.0, 245.0, 1000.0, 896.0, 1000.0, 721.0, 181.0]
2025-05-09 22:44:08,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 34 minutes, 53 seconds)
2025-05-09 22:46:35,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:46:44,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1567.67236 ± 1182.468
2025-05-09 22:46:44,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [272.69098, 1141.7454, 3129.3267, 2585.6157, 3067.4844, 3013.2898, 914.8253, 1135.0125, 187.60439, 229.12784]
2025-05-09 22:46:44,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 354.0, 1000.0, 857.0, 1000.0, 1000.0, 327.0, 401.0, 102.0, 106.0]
2025-05-09 22:46:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 29 minutes, 24 seconds)
2025-05-09 22:49:25,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:49:39,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2415.35645 ± 908.388
2025-05-09 22:49:39,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2570.244, 3087.8276, 3054.8606, 1372.7579, 3029.1829, 2787.1318, 2257.4219, 3055.3298, 2799.8801, 138.92696]
2025-05-09 22:49:39,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [828.0, 1000.0, 1000.0, 431.0, 1000.0, 869.0, 713.0, 1000.0, 879.0, 85.0]
2025-05-09 22:49:39,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (2415.36) for latency MM1Queue_a033_s075
2025-05-09 22:49:39,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 22:49:39,893 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:49:39,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 27 minutes, 3 seconds)
2025-05-09 22:52:20,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:52:29,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1517.41650 ± 980.536
2025-05-09 22:52:29,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1628.1935, 356.5931, 1859.1268, 198.19806, 3005.9573, 353.58923, 3044.3064, 1360.8901, 2139.8406, 1227.4696]
2025-05-09 22:52:29,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [546.0, 161.0, 627.0, 110.0, 1000.0, 160.0, 1000.0, 442.0, 646.0, 430.0]
2025-05-09 22:52:29,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 24 minutes, 38 seconds)
2025-05-09 22:55:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:55:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2229.14185 ± 1119.768
2025-05-09 22:55:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [175.24918, 353.11395, 1704.2351, 3056.7266, 3000.6355, 3055.6367, 3198.451, 1670.2911, 3038.5134, 3038.5654]
2025-05-09 22:55:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 154.0, 627.0, 1000.0, 1000.0, 1000.0, 1000.0, 577.0, 1000.0, 1000.0]
2025-05-09 22:55:24,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 21 minutes, 52 seconds)
2025-05-09 22:58:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:58:10,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1301.78467 ± 548.218
2025-05-09 22:58:10,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1149.1707, 1187.2948, 2090.0408, 1173.6227, 1814.2988, 464.6087, 1489.6393, 579.51483, 942.3161, 2127.3396]
2025-05-09 22:58:10,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [355.0, 372.0, 644.0, 394.0, 603.0, 190.0, 468.0, 219.0, 320.0, 657.0]
2025-05-09 22:58:10,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 18 minutes, 38 seconds)
2025-05-09 23:00:59,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:01:14,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2449.16260 ± 925.278
2025-05-09 23:01:14,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3010.1147, 160.97849, 2365.6196, 1235.2614, 2921.2192, 2924.3186, 2956.4666, 2998.3638, 2988.2737, 2931.0112]
2025-05-09 23:01:14,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 88.0, 809.0, 452.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:01:14,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (2449.16) for latency MM1Queue_a033_s075
2025-05-09 23:01:14,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 23:01:14,175 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:01:14,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 18 minutes, 17 seconds)
2025-05-09 23:03:50,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:03:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1324.73950 ± 606.885
2025-05-09 23:03:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2358.2026, 1241.8218, 622.6336, 2404.9797, 1169.2887, 1388.5079, 1161.1807, 1352.588, 1152.3289, 395.8626]
2025-05-09 23:03:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [765.0, 390.0, 235.0, 740.0, 362.0, 443.0, 374.0, 415.0, 366.0, 169.0]
2025-05-09 23:03:57,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 14 minutes, 19 seconds)
2025-05-09 23:06:44,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:06:56,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1965.58105 ± 1065.046
2025-05-09 23:06:56,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3058.8875, 3026.814, 2281.8745, 878.2919, 347.17026, 2970.7278, 3059.8142, 1039.7855, 597.6569, 2394.7898]
2025-05-09 23:06:56,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 713.0, 320.0, 150.0, 988.0, 1000.0, 369.0, 238.0, 788.0]
2025-05-09 23:06:56,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 12 minutes, 12 seconds)
2025-05-09 23:09:40,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:09:52,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2274.48511 ± 805.316
2025-05-09 23:09:52,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1007.6992, 1268.4498, 1738.1515, 3105.8533, 1913.0454, 3147.6362, 3119.479, 1645.8795, 2668.4507, 3130.206]
2025-05-09 23:09:52,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [302.0, 390.0, 516.0, 1000.0, 612.0, 1000.0, 986.0, 505.0, 821.0, 1000.0]
2025-05-09 23:09:52,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 9 minutes, 25 seconds)
2025-05-09 23:12:23,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:12:34,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1892.04431 ± 1221.566
2025-05-09 23:12:34,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [283.5824, 3073.1855, 3111.083, 3029.586, 611.1286, 1380.2744, 3068.2112, 1133.875, 175.08398, 3054.4326]
2025-05-09 23:12:34,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 1000.0, 1000.0, 1000.0, 237.0, 475.0, 1000.0, 401.0, 99.0, 1000.0]
2025-05-09 23:12:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 6 minutes, 12 seconds)
2025-05-09 23:15:17,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:15:30,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2190.29956 ± 1302.577
2025-05-09 23:15:30,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [130.91423, 3027.4583, 242.76126, 3075.6226, 230.5253, 2998.8516, 3084.4126, 3066.485, 3031.2332, 3014.7317]
2025-05-09 23:15:30,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 1000.0, 121.0, 1000.0, 118.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:15:30,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 2 minutes, 47 seconds)
2025-05-09 23:18:14,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:18:32,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 3035.65649 ± 103.795
2025-05-09 23:18:32,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3106.0745, 3048.7156, 3098.7285, 3103.7454, 2853.7183, 3051.6724, 2822.1213, 3064.138, 3153.4795, 3054.1704]
2025-05-09 23:18:32,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 895.0, 1000.0, 902.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:18:32,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (3035.66) for latency MM1Queue_a033_s075
2025-05-09 23:18:32,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-09 23:18:32,039 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:18:32,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 1 minute, 12 seconds)
2025-05-09 23:21:10,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:21:23,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2174.21631 ± 1250.371
2025-05-09 23:21:23,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3039.9285, 3202.8022, 376.79858, 3158.6362, 1798.1183, 350.14072, 3143.1885, 3161.2695, 360.15775, 3151.124]
2025-05-09 23:21:23,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [928.0, 1000.0, 170.0, 1000.0, 597.0, 152.0, 1000.0, 1000.0, 160.0, 1000.0]
2025-05-09 23:21:23,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 57 minutes, 47 seconds)
2025-05-09 23:24:03,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:24:10,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1253.81348 ± 1051.051
2025-05-09 23:24:10,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [651.4417, 1621.5123, 436.8399, 1863.2179, 127.488945, 907.6421, 421.3658, 3087.5596, 359.79654, 3061.269]
2025-05-09 23:24:10,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 547.0, 188.0, 636.0, 75.0, 333.0, 184.0, 1000.0, 161.0, 1000.0]
2025-05-09 23:24:10,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 54 minutes, 21 seconds)
2025-05-09 23:27:01,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:27:18,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2774.58838 ± 807.412
2025-05-09 23:27:18,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2836.4875, 3076.7344, 3059.6028, 3072.4773, 3041.0894, 3071.6953, 3064.1711, 3068.507, 361.5489, 3093.568]
2025-05-09 23:27:18,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [920.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 997.0, 160.0, 1000.0]
2025-05-09 23:27:18,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 53 minutes, 2 seconds)
2025-05-09 23:29:47,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:29:56,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1739.93286 ± 737.357
2025-05-09 23:29:56,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3127.5117, 1668.3179, 1889.6755, 1186.1489, 930.383, 1274.0493, 3105.6755, 1488.9083, 1146.9542, 1581.7029]
2025-05-09 23:29:56,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 527.0, 626.0, 372.0, 336.0, 422.0, 1000.0, 458.0, 399.0, 514.0]
2025-05-09 23:29:56,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 49 minutes, 5 seconds)
2025-05-09 23:32:37,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:32:47,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1702.82983 ± 1374.143
2025-05-09 23:32:47,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3072.5347, 3066.3594, 195.4054, 3056.8079, 331.96063, 188.4546, 3091.845, 201.52782, 3051.5815, 771.8217]
2025-05-09 23:32:47,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 103.0, 1000.0, 155.0, 98.0, 1000.0, 109.0, 1000.0, 293.0]
2025-05-09 23:32:47,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 45 minutes, 38 seconds)
2025-05-09 23:35:36,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:35:49,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2164.14697 ± 1055.301
2025-05-09 23:35:49,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3064.0364, 1248.261, 791.0854, 3003.183, 3068.7888, 3088.37, 970.62024, 2771.239, 554.3528, 3081.5337]
2025-05-09 23:35:49,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 390.0, 296.0, 992.0, 1000.0, 1000.0, 352.0, 899.0, 221.0, 1000.0]
2025-05-09 23:35:49,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 43 minutes, 19 seconds)
2025-05-09 23:38:29,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:38:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1682.80176 ± 1185.615
2025-05-09 23:38:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [761.63336, 1395.6681, 2969.4934, 3036.8687, 368.08777, 3015.3875, 86.36763, 1774.0278, 373.4416, 3047.041]
2025-05-09 23:38:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [302.0, 501.0, 991.0, 1000.0, 166.0, 1000.0, 60.0, 592.0, 170.0, 1000.0]
2025-05-09 23:38:39,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 40 minutes, 33 seconds)
2025-05-09 23:41:20,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:41:37,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2862.30933 ± 592.908
2025-05-09 23:41:37,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3093.6296, 3079.998, 3111.3933, 1108.0231, 3088.728, 3100.2495, 3092.1538, 2768.1482, 3116.1367, 3064.6328]
2025-05-09 23:41:37,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 364.0, 1000.0, 1000.0, 1000.0, 898.0, 1000.0, 1000.0]
2025-05-09 23:41:37,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 12 seconds)
2025-05-09 23:44:14,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:44:23,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1459.41138 ± 1271.979
2025-05-09 23:44:23,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [411.84482, 794.0672, 379.29312, 331.43637, 3072.9673, 315.3157, 2811.305, 3033.9712, 3108.14, 335.77228]
2025-05-09 23:44:23,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 307.0, 165.0, 149.0, 1000.0, 140.0, 911.0, 1000.0, 1000.0, 151.0]
2025-05-09 23:44:23,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 39 seconds)
2025-05-09 23:47:06,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:47:20,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2184.23315 ± 1331.972
2025-05-09 23:47:20,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [161.7703, 134.2023, 154.69827, 3071.8574, 3090.2134, 2968.254, 3060.847, 3063.2703, 3049.4705, 3087.751]
2025-05-09 23:47:20,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 79.0, 85.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:47:20,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 31 minutes, 59 seconds)
2025-05-09 23:50:13,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:50:27,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2387.70752 ± 760.849
2025-05-09 23:50:27,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3045.0918, 787.9738, 3053.3914, 2066.7231, 2442.2068, 3042.5625, 2442.955, 2713.28, 3032.8784, 1250.0118]
2025-05-09 23:50:27,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 293.0, 1000.0, 644.0, 798.0, 1000.0, 807.0, 885.0, 1000.0, 428.0]
2025-05-09 23:50:27,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 29 minutes, 16 seconds)
2025-05-09 23:52:56,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:53:10,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2473.68677 ± 973.385
2025-05-09 23:53:10,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2654.3167, 1853.7848, 1391.1017, 3143.323, 3080.296, 3081.276, 3194.4204, 148.14622, 3080.5012, 3109.701]
2025-05-09 23:53:10,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [868.0, 553.0, 428.0, 1000.0, 1000.0, 1000.0, 971.0, 85.0, 1000.0, 1000.0]
2025-05-09 23:53:10,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 6 seconds)
2025-05-09 23:55:59,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:56:11,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2218.48486 ± 1146.465
2025-05-09 23:56:11,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1001.999, 114.151855, 3128.916, 1013.0696, 3131.6558, 3118.892, 3160.1897, 3105.4587, 1301.0035, 3109.5127]
2025-05-09 23:56:11,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [306.0, 64.0, 1000.0, 350.0, 1000.0, 1000.0, 1000.0, 1000.0, 387.0, 1000.0]
2025-05-09 23:56:11,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 19 seconds)
2025-05-09 23:58:56,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:59:09,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2273.01123 ± 1043.613
2025-05-09 23:59:09,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3058.0115, 1851.9572, 110.40552, 3057.8542, 3021.9775, 3074.2507, 2067.3147, 2808.7634, 636.1934, 3043.3843]
2025-05-09 23:59:09,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 601.0, 70.0, 1000.0, 1000.0, 1000.0, 710.0, 916.0, 232.0, 1000.0]
2025-05-09 23:59:09,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 41 seconds)
2025-05-10 00:01:44,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:01:56,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2107.78198 ± 1119.609
2025-05-10 00:01:56,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [1377.2544, 2387.5264, 2762.202, 115.31784, 3113.1675, 3191.2532, 3162.171, 882.1914, 877.5374, 3209.1982]
2025-05-10 00:01:56,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [412.0, 748.0, 847.0, 69.0, 1000.0, 1000.0, 1000.0, 323.0, 325.0, 1000.0]
2025-05-10 00:01:56,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 30 seconds)
2025-05-10 00:04:35,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:04:50,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2520.86499 ± 827.454
2025-05-10 00:04:50,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2698.778, 3049.6814, 2981.7476, 2223.5684, 1509.173, 3093.255, 3081.053, 2988.3398, 3073.586, 509.46884]
2025-05-10 00:04:50,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [888.0, 1000.0, 1000.0, 737.0, 521.0, 1000.0, 1000.0, 1000.0, 1000.0, 209.0]
2025-05-10 00:04:50,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 22 seconds)
2025-05-10 00:07:33,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:07:42,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1710.15076 ± 1022.443
2025-05-10 00:07:42,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [2694.2976, 2987.4954, 906.23865, 1933.742, 354.96103, 2487.578, 342.3664, 1358.2921, 893.09357, 3143.4424]
2025-05-10 00:07:42,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [817.0, 943.0, 280.0, 577.0, 153.0, 793.0, 152.0, 461.0, 281.0, 1000.0]
2025-05-10 00:07:42,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 37 seconds)
2025-05-10 00:10:29,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:10:44,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2645.36719 ± 864.621
2025-05-10 00:10:44,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [3065.105, 3100.7017, 1302.7032, 3060.6816, 3079.0884, 3064.6936, 3059.6804, 589.2388, 3066.6035, 3065.176]
2025-05-10 00:10:44,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 445.0, 1000.0, 1000.0, 1000.0, 1000.0, 235.0, 1000.0, 1000.0]
2025-05-10 00:10:44,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 43 seconds)
2025-05-10 00:13:25,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:13:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 2526.86377 ± 956.183
2025-05-10 00:13:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [86.83092, 1528.4978, 3079.6606, 2171.7004, 3126.1301, 3090.9785, 3085.3892, 3059.0435, 2994.0774, 3046.33]
2025-05-10 00:13:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 479.0, 1000.0, 716.0, 1000.0, 1000.0, 1000.0, 1000.0, 973.0, 1000.0]
2025-05-10 00:13:40,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 48 seconds)
2025-05-10 00:16:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:16:35,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1539.60767 ± 932.752
2025-05-10 00:16:35,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [877.63226, 1855.478, 2183.9204, 200.97261, 2772.9595, 675.8224, 2045.9943, 635.6377, 1063.0049, 3084.6538]
2025-05-10 00:16:35,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [319.0, 625.0, 641.0, 108.0, 905.0, 263.0, 683.0, 245.0, 374.0, 1000.0]
2025-05-10 00:16:35,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 55 seconds)
2025-05-10 00:19:12,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:19:20,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 1308.95044 ± 1233.471
2025-05-10 00:19:20,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [554.6433, 3170.758, 1868.0698, 145.90332, 179.61293, 634.4635, 3190.638, 171.978, 2769.0479, 404.38965]
2025-05-10 00:19:20,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 1000.0, 588.0, 82.0, 104.0, 237.0, 1000.0, 99.0, 892.0, 172.0]
2025-05-10 00:19:20,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1251 [DEBUG]: Training session finished
