2025-05-11 20:38:04,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4
2025-05-11 20:38:04,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4
2025-05-11 20:38:04,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7ea6b65c5c70>}
2025-05-11 20:38:04,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1111 [DEBUG]: using device: cpu
2025-05-11 20:38:04,065 baseline-bpql-noisy-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-11 20:38:04,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-11 20:38:04,077 baseline-bpql-noisy-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 20:38:04,077 baseline-bpql-noisy-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 20:38:04,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-11 20:38:04,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-11 20:41:15,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:41:19,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 134.03769 ± 149.979
2025-05-11 20:41:19,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [68.92912, 29.952335, 56.780777, 33.87217, 308.93243, 105.90341, 521.72205, 71.69699, 56.740845, 85.84681]
2025-05-11 20:41:19,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [91.0, 133.0, 168.0, 142.0, 185.0, 218.0, 365.0, 184.0, 167.0, 199.0]
2025-05-11 20:41:19,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (134.04) for latency MM1Queue_a033_s075
2025-05-11 20:41:19,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:41:19,090 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:41:19,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 21 minutes, 16 seconds)
2025-05-11 20:44:53,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:44:56,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 190.24521 ± 116.761
2025-05-11 20:44:56,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [173.77327, 285.35846, 267.4557, 49.266434, 78.81351, 415.3428, 76.31365, 238.10474, 261.15906, 56.86455]
2025-05-11 20:44:56,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [128.0, 460.0, 152.0, 80.0, 110.0, 211.0, 107.0, 172.0, 280.0, 65.0]
2025-05-11 20:44:56,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (190.25) for latency MM1Queue_a033_s075
2025-05-11 20:44:56,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:44:56,793 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:44:56,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 36 minutes, 48 seconds)
2025-05-11 20:48:25,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:48:28,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 68.96407 ± 82.292
2025-05-11 20:48:28,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [52.87438, 22.671354, 235.52342, 71.25938, 3.0199144, 220.5217, 9.457391, 46.491856, 22.402922, 5.418353]
2025-05-11 20:48:28,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [239.0, 43.0, 311.0, 232.0, 174.0, 225.0, 85.0, 99.0, 197.0, 89.0]
2025-05-11 20:48:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 36 minutes, 21 seconds)
2025-05-11 20:52:02,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:52:05,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 173.07088 ± 175.567
2025-05-11 20:52:05,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [541.53125, 112.524826, 280.0024, 69.79034, 41.242783, -7.4587464, 30.918634, 390.67914, 20.870182, 250.6078]
2025-05-11 20:52:05,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [467.0, 111.0, 181.0, 68.0, 171.0, 141.0, 38.0, 208.0, 27.0, 199.0]
2025-05-11 20:52:05,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 36 minutes, 20 seconds)
2025-05-11 20:55:01,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:55:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 237.91769 ± 149.675
2025-05-11 20:55:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [510.09317, 144.64684, 78.926994, 293.91064, 15.493598, 400.35434, 361.7366, 188.21187, 279.8567, 105.94628]
2025-05-11 20:55:03,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [395.0, 99.0, 84.0, 169.0, 24.0, 285.0, 201.0, 108.0, 162.0, 238.0]
2025-05-11 20:55:03,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (237.92) for latency MM1Queue_a033_s075
2025-05-11 20:55:03,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:55:03,406 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:55:03,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 22 minutes, 41 seconds)
2025-05-11 20:58:00,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:58:02,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 126.55898 ± 110.620
2025-05-11 20:58:02,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [61.355835, 69.68541, 126.93032, 29.037565, 26.159946, 242.07143, 253.37564, 10.835911, 92.98356, 353.1543]
2025-05-11 20:58:02,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 175.0, 98.0, 46.0, 39.0, 148.0, 170.0, 23.0, 132.0, 328.0]
2025-05-11 20:58:02,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 14 minutes, 23 seconds)
2025-05-11 21:01:17,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:01:20,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 183.08359 ± 118.820
2025-05-11 21:01:20,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [138.11902, 18.42737, 309.59998, 235.32054, 46.51832, 1.3760289, 342.68594, 291.2093, 199.63596, 247.94325]
2025-05-11 21:01:20,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 34.0, 242.0, 218.0, 111.0, 12.0, 303.0, 237.0, 173.0, 207.0]
2025-05-11 21:01:20,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 4 minutes, 58 seconds)
2025-05-11 21:04:55,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:04:58,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 205.66797 ± 130.486
2025-05-11 21:04:58,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [194.59541, 9.928037, 44.962734, 281.2193, 319.93466, 201.5142, 244.89844, 219.21616, 466.63373, 73.77679]
2025-05-11 21:04:58,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 20.0, 79.0, 176.0, 178.0, 175.0, 166.0, 170.0, 281.0, 130.0]
2025-05-11 21:04:58,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 3 minutes, 29 seconds)
2025-05-11 21:08:24,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:08:25,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 137.59500 ± 98.493
2025-05-11 21:08:25,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [180.61066, 208.93672, 45.742756, 229.02867, 8.619689, 26.986732, 92.98419, 199.89099, 63.6973, 319.45242]
2025-05-11 21:08:25,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 203.0, 73.0, 172.0, 17.0, 59.0, 92.0, 177.0, 92.0, 203.0]
2025-05-11 21:08:25,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 57 minutes, 22 seconds)
2025-05-11 21:11:13,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:11:14,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 137.97049 ± 83.361
2025-05-11 21:11:14,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [154.60054, 247.83054, 82.79718, 7.6573954, 163.86955, 249.51135, 28.957737, 83.10382, 128.42006, 232.95663]
2025-05-11 21:11:14,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 173.0, 118.0, 32.0, 116.0, 155.0, 36.0, 88.0, 133.0, 156.0]
2025-05-11 21:11:14,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 51 minutes, 24 seconds)
2025-05-11 21:14:07,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:14:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 214.10596 ± 116.907
2025-05-11 21:14:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [273.90836, 167.80171, 45.867615, 403.63565, 312.32568, 14.580315, 158.70796, 311.90442, 179.56114, 272.76672]
2025-05-11 21:14:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 101.0, 58.0, 270.0, 194.0, 28.0, 100.0, 153.0, 100.0, 179.0]
2025-05-11 21:14:09,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 46 minutes, 48 seconds)
2025-05-11 21:16:57,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:16:59,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 227.75827 ± 157.794
2025-05-11 21:16:59,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [226.0863, 50.399803, 262.5021, 343.1367, 67.92472, 361.18213, 206.4495, 29.330666, 567.778, 162.793]
2025-05-11 21:16:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 81.0, 140.0, 232.0, 87.0, 206.0, 127.0, 48.0, 420.0, 114.0]
2025-05-11 21:16:59,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 35 minutes, 21 seconds)
2025-05-11 21:19:51,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:19:54,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 212.30815 ± 109.877
2025-05-11 21:19:54,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [227.55731, 130.99683, 313.71503, 91.35475, 318.63934, 77.73022, 342.74942, 250.85059, 44.14785, 325.34024]
2025-05-11 21:19:54,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 407.0, 222.0, 183.0, 261.0, 118.0, 339.0, 206.0, 91.0, 223.0]
2025-05-11 21:19:54,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 19 minutes, 54 seconds)
2025-05-11 21:22:44,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:22:46,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 239.92729 ± 148.536
2025-05-11 21:22:46,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [538.13995, 341.11877, 255.20235, 148.38988, 277.62894, 344.35306, 262.22375, 182.69073, 70.34403, -20.81865]
2025-05-11 21:22:46,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 183.0, 147.0, 120.0, 203.0, 219.0, 202.0, 146.0, 77.0, 154.0]
2025-05-11 21:22:46,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (239.93) for latency MM1Queue_a033_s075
2025-05-11 21:22:46,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:22:46,886 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:22:46,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 6 minutes, 54 seconds)
2025-05-11 21:25:37,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:25:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 162.58632 ± 55.016
2025-05-11 21:25:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [124.921875, 144.13155, 122.19216, 121.32105, 221.51123, 269.05804, 102.546, 108.03588, 207.26224, 204.88315]
2025-05-11 21:25:38,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [157.0, 111.0, 94.0, 115.0, 128.0, 192.0, 90.0, 99.0, 127.0, 118.0]
2025-05-11 21:25:38,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 4 minutes, 51 seconds)
2025-05-11 21:28:37,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:28:40,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 262.39719 ± 149.414
2025-05-11 21:28:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [424.1264, 302.61993, 394.43774, 90.93905, 57.542034, 433.297, 346.7226, 83.13289, 382.78622, 108.36818]
2025-05-11 21:28:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 239.0, 249.0, 192.0, 70.0, 300.0, 213.0, 188.0, 298.0, 101.0]
2025-05-11 21:28:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (262.40) for latency MM1Queue_a033_s075
2025-05-11 21:28:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:28:40,934 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:28:40,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 4 minutes, 4 seconds)
2025-05-11 21:32:12,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:32:15,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 355.15363 ± 69.929
2025-05-11 21:32:15,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [408.3482, 393.00018, 246.1347, 272.57785, 444.61917, 425.6456, 410.8818, 263.0081, 320.19876, 367.1218]
2025-05-11 21:32:15,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 200.0, 150.0, 149.0, 251.0, 221.0, 203.0, 166.0, 187.0, 191.0]
2025-05-11 21:32:15,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (355.15) for latency MM1Queue_a033_s075
2025-05-11 21:32:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:32:15,711 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:32:15,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 13 minutes, 33 seconds)
2025-05-11 21:35:39,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:35:42,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 245.98608 ± 60.740
2025-05-11 21:35:42,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [218.79416, 142.98228, 256.20352, 152.41623, 311.6812, 277.01154, 316.20963, 288.39084, 199.77312, 296.39822]
2025-05-11 21:35:42,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 250.0, 212.0, 178.0, 224.0, 204.0, 215.0, 217.0, 174.0, 219.0]
2025-05-11 21:35:42,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 19 minutes, 2 seconds)
2025-05-11 21:38:31,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:38:35,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 326.12735 ± 152.815
2025-05-11 21:38:35,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [331.43924, 415.7486, 407.25702, 245.07037, 495.10046, 573.723, 354.2732, 272.6698, 100.03283, 65.958885]
2025-05-11 21:38:35,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 299.0, 293.0, 188.0, 292.0, 452.0, 242.0, 308.0, 204.0, 107.0]
2025-05-11 21:38:35,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 16 minutes)
2025-05-11 21:41:34,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:41:37,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 345.11761 ± 129.892
2025-05-11 21:41:37,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [601.8363, 382.3348, 347.91086, 84.173294, 454.4544, 257.3079, 265.82092, 333.12894, 419.78815, 304.4207]
2025-05-11 21:41:37,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [470.0, 222.0, 224.0, 113.0, 251.0, 197.0, 192.0, 167.0, 301.0, 231.0]
2025-05-11 21:41:37,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 15 minutes, 32 seconds)
2025-05-11 21:44:28,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:44:31,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 262.77875 ± 89.906
2025-05-11 21:44:31,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [353.3182, 216.01105, 287.9526, 183.89662, 205.26538, 267.39746, 125.333824, 266.09348, 466.0486, 256.4702]
2025-05-11 21:44:31,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 166.0, 154.0, 115.0, 121.0, 148.0, 98.0, 147.0, 255.0, 176.0]
2025-05-11 21:44:31,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 10 minutes, 11 seconds)
2025-05-11 21:47:21,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:47:23,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 363.43314 ± 45.497
2025-05-11 21:47:23,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [306.49155, 421.95172, 344.64038, 429.83826, 363.58875, 421.0962, 371.88217, 326.86166, 297.2977, 350.6832]
2025-05-11 21:47:23,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 214.0, 189.0, 223.0, 167.0, 215.0, 211.0, 188.0, 173.0, 194.0]
2025-05-11 21:47:23,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (363.43) for latency MM1Queue_a033_s075
2025-05-11 21:47:23,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:47:23,734 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:47:23,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 56 minutes, 5 seconds)
2025-05-11 21:50:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:50:19,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 460.69403 ± 286.104
2025-05-11 21:50:19,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [277.52103, 301.96533, 310.54297, 1017.5623, 297.3072, 268.66196, 374.47888, 276.81177, 453.68637, 1028.4031]
2025-05-11 21:50:19,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 181.0, 175.0, 1000.0, 173.0, 130.0, 199.0, 169.0, 262.0, 1000.0]
2025-05-11 21:50:19,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (460.69) for latency MM1Queue_a033_s075
2025-05-11 21:50:19,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:50:19,169 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:50:19,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 45 minutes, 6 seconds)
2025-05-11 21:53:11,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:53:13,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 332.35846 ± 46.105
2025-05-11 21:53:13,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [313.64774, 251.87349, 368.4002, 269.43182, 373.49692, 322.88803, 379.83823, 365.74997, 294.81696, 383.44122]
2025-05-11 21:53:13,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 135.0, 202.0, 142.0, 195.0, 171.0, 166.0, 196.0, 139.0, 197.0]
2025-05-11 21:53:13,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 42 minutes, 37 seconds)
2025-05-11 21:56:04,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:56:07,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 425.10138 ± 39.786
2025-05-11 21:56:07,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [412.8439, 447.66074, 452.95596, 424.0587, 435.68503, 357.22403, 393.6708, 370.4631, 472.94202, 483.50903]
2025-05-11 21:56:07,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 256.0, 249.0, 230.0, 252.0, 177.0, 222.0, 212.0, 266.0, 283.0]
2025-05-11 21:56:07,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 37 minutes, 41 seconds)
2025-05-11 21:58:59,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:59:02,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 444.42041 ± 73.012
2025-05-11 21:59:02,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [414.2689, 422.672, 379.63168, 487.9052, 646.47034, 404.2705, 415.95978, 407.80945, 410.5634, 454.65308]
2025-05-11 21:59:02,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 240.0, 175.0, 280.0, 371.0, 231.0, 219.0, 218.0, 234.0, 258.0]
2025-05-11 21:59:02,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 35 minutes, 5 seconds)
2025-05-11 22:01:51,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:01:53,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 353.85034 ± 53.227
2025-05-11 22:01:53,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [332.93384, 355.46167, 366.03427, 225.53215, 421.1333, 324.09064, 357.08362, 373.77362, 427.7147, 354.7458]
2025-05-11 22:01:53,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 185.0, 190.0, 128.0, 234.0, 168.0, 191.0, 190.0, 275.0, 185.0]
2025-05-11 22:01:53,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 31 minutes, 45 seconds)
2025-05-11 22:04:45,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:04:48,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 361.27014 ± 60.084
2025-05-11 22:04:48,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [361.81702, 386.0754, 392.69077, 248.095, 302.23895, 464.2625, 411.1792, 293.41852, 371.9221, 381.00198]
2025-05-11 22:04:48,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 176.0, 189.0, 119.0, 143.0, 251.0, 211.0, 137.0, 182.0, 179.0]
2025-05-11 22:04:48,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 28 minutes, 35 seconds)
2025-05-11 22:07:39,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:07:41,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 363.14523 ± 33.970
2025-05-11 22:07:41,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [358.98013, 374.69623, 326.17126, 398.57935, 345.07843, 440.05933, 357.75, 332.2779, 325.07797, 372.78165]
2025-05-11 22:07:41,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 180.0, 155.0, 182.0, 162.0, 195.0, 166.0, 159.0, 154.0, 166.0]
2025-05-11 22:07:41,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 25 minutes, 17 seconds)
2025-05-11 22:10:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:10:32,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 378.04691 ± 32.970
2025-05-11 22:10:32,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [406.6697, 349.58597, 361.4329, 405.58484, 381.6554, 341.58902, 452.91983, 359.8012, 373.22137, 348.00882]
2025-05-11 22:10:32,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 160.0, 163.0, 178.0, 181.0, 156.0, 198.0, 164.0, 167.0, 166.0]
2025-05-11 22:10:32,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 21 minutes, 42 seconds)
2025-05-11 22:13:22,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:13:25,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 467.71533 ± 67.614
2025-05-11 22:13:25,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [498.7939, 432.32678, 426.1672, 526.4062, 389.9603, 520.47815, 507.3496, 580.082, 345.2437, 450.34534]
2025-05-11 22:13:25,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [244.0, 212.0, 207.0, 245.0, 195.0, 229.0, 243.0, 277.0, 174.0, 230.0]
2025-05-11 22:13:25,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (467.72) for latency MM1Queue_a033_s075
2025-05-11 22:13:25,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:13:25,134 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:13:25,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 18 minutes, 17 seconds)
2025-05-11 22:16:16,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:16:19,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 439.33813 ± 18.566
2025-05-11 22:16:19,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [442.64398, 416.46988, 429.65976, 458.41205, 436.07364, 485.4256, 426.29272, 428.98392, 435.337, 434.08304]
2025-05-11 22:16:19,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 209.0, 231.0, 246.0, 234.0, 248.0, 225.0, 231.0, 235.0, 224.0]
2025-05-11 22:16:19,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 16 minutes, 7 seconds)
2025-05-11 22:19:11,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:19:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 694.09265 ± 293.567
2025-05-11 22:19:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [640.5392, 723.46924, 327.6698, 1051.7871, 578.7894, 867.4341, 379.39984, 699.51465, 381.68185, 1290.6411]
2025-05-11 22:19:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 309.0, 157.0, 415.0, 235.0, 334.0, 187.0, 251.0, 169.0, 537.0]
2025-05-11 22:19:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (694.09) for latency MM1Queue_a033_s075
2025-05-11 22:19:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:19:15,339 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:19:15,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 13 minutes, 38 seconds)
2025-05-11 22:22:07,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:22:12,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 879.58740 ± 232.034
2025-05-11 22:22:12,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [900.9253, 612.5084, 537.936, 1029.3412, 1191.8823, 800.29736, 820.52954, 821.3522, 753.36755, 1327.734]
2025-05-11 22:22:12,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [381.0, 266.0, 227.0, 393.0, 497.0, 306.0, 348.0, 310.0, 324.0, 530.0]
2025-05-11 22:22:12,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (879.59) for latency MM1Queue_a033_s075
2025-05-11 22:22:12,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:22:12,231 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:22:12,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 11 minutes, 36 seconds)
2025-05-11 22:25:07,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:25:11,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 768.29974 ± 352.761
2025-05-11 22:25:11,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [964.7372, 738.70245, 716.07245, 1328.134, 791.1791, 630.8817, 25.397486, 388.36465, 1135.4619, 964.0662]
2025-05-11 22:25:11,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [389.0, 286.0, 270.0, 523.0, 355.0, 262.0, 42.0, 162.0, 435.0, 400.0]
2025-05-11 22:25:11,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 10 minutes, 26 seconds)
2025-05-11 22:28:03,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:28:08,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1022.20038 ± 382.281
2025-05-11 22:28:08,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [304.25156, 1098.3063, 858.21814, 1254.5972, 662.87177, 842.3697, 811.4489, 1538.8727, 1600.2747, 1250.793]
2025-05-11 22:28:08,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 495.0, 306.0, 465.0, 270.0, 369.0, 320.0, 583.0, 546.0, 467.0]
2025-05-11 22:28:08,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1022.20) for latency MM1Queue_a033_s075
2025-05-11 22:28:08,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:28:08,664 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:28:08,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 8 minutes, 29 seconds)
2025-05-11 22:31:01,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:31:06,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 897.61163 ± 298.923
2025-05-11 22:31:06,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1398.5743, 750.32214, 355.44162, 1025.2698, 1131.2966, 927.8587, 660.44403, 545.24445, 1086.1682, 1095.4965]
2025-05-11 22:31:06,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [528.0, 270.0, 145.0, 377.0, 419.0, 359.0, 277.0, 199.0, 391.0, 399.0]
2025-05-11 22:31:06,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 6 minutes, 16 seconds)
2025-05-11 22:33:54,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:34:00,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1082.37024 ± 534.340
2025-05-11 22:34:00,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [757.0832, 1251.612, 428.2252, 1523.2393, 1103.4851, 1671.9939, 908.7468, 548.0616, 501.40497, 2129.8499]
2025-05-11 22:34:00,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 440.0, 169.0, 574.0, 423.0, 653.0, 310.0, 198.0, 189.0, 837.0]
2025-05-11 22:34:00,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1082.37) for latency MM1Queue_a033_s075
2025-05-11 22:34:00,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:34:00,125 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:34:00,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 2 minutes, 51 seconds)
2025-05-11 22:36:51,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:37:03,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1751.20435 ± 555.828
2025-05-11 22:37:03,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1759.3964, 2311.5469, 2329.917, 580.6216, 2264.3933, 1131.9489, 2176.0156, 1899.2091, 1281.8063, 1777.1876]
2025-05-11 22:37:03,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [746.0, 1000.0, 983.0, 285.0, 1000.0, 400.0, 854.0, 701.0, 533.0, 688.0]
2025-05-11 22:37:03,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (1751.20) for latency MM1Queue_a033_s075
2025-05-11 22:37:03,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:37:03,008 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:37:03,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 1 minute, 7 seconds)
2025-05-11 22:39:58,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:40:13,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2382.79883 ± 297.177
2025-05-11 22:40:13,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2281.6968, 2448.9668, 2508.1748, 2651.7268, 2441.7854, 2497.778, 2535.8108, 2508.806, 2420.7786, 1532.4646]
2025-05-11 22:40:13,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 646.0]
2025-05-11 22:40:13,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (2382.80) for latency MM1Queue_a033_s075
2025-05-11 22:40:13,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:40:13,447 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:40:13,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 24 seconds)
2025-05-11 22:43:06,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:43:10,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 872.64307 ± 169.081
2025-05-11 22:43:10,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1116.3317, 954.58417, 1064.796, 697.459, 607.44965, 885.71576, 902.08264, 619.8636, 1002.6475, 875.5001]
2025-05-11 22:43:10,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [380.0, 310.0, 323.0, 245.0, 225.0, 319.0, 303.0, 236.0, 373.0, 316.0]
2025-05-11 22:43:10,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 57 minutes, 19 seconds)
2025-05-11 22:46:03,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:46:07,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 864.37561 ± 504.398
2025-05-11 22:46:07,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1393.2845, 226.0436, 211.8162, 896.2098, 1379.6406, -5.576255, 1415.2571, 1110.9171, 974.60364, 1041.5591]
2025-05-11 22:46:07,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [522.0, 116.0, 106.0, 329.0, 490.0, 16.0, 440.0, 395.0, 366.0, 386.0]
2025-05-11 22:46:07,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 54 minutes, 14 seconds)
2025-05-11 22:48:52,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:48:57,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1154.50610 ± 369.584
2025-05-11 22:48:57,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1274.5338, 1456.2097, 1001.69336, 1307.4912, 989.0385, 344.12595, 1051.3777, 1837.5441, 1302.9631, 980.08417]
2025-05-11 22:48:57,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [379.0, 568.0, 350.0, 436.0, 342.0, 216.0, 371.0, 614.0, 443.0, 338.0]
2025-05-11 22:48:57,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 50 minutes, 34 seconds)
2025-05-11 22:51:53,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:51:57,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1084.81824 ± 101.756
2025-05-11 22:51:57,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [965.80035, 1175.7739, 966.44934, 1030.0938, 1261.3484, 1217.958, 1066.579, 966.09204, 1112.7415, 1085.3466]
2025-05-11 22:51:57,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 352.0, 342.0, 320.0, 397.0, 357.0, 374.0, 289.0, 362.0, 348.0]
2025-05-11 22:51:57,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 47 minutes, 2 seconds)
2025-05-11 22:54:48,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:54:56,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1623.52441 ± 753.135
2025-05-11 22:54:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [561.3281, 2128.1628, 458.90494, 2678.4746, 1710.8043, 1607.2859, 1816.2258, 2801.347, 1026.7577, 1445.9526]
2025-05-11 22:54:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [200.0, 757.0, 179.0, 822.0, 577.0, 531.0, 589.0, 1000.0, 304.0, 489.0]
2025-05-11 22:54:56,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 41 minutes, 57 seconds)
2025-05-11 22:57:50,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:58:01,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2222.42627 ± 542.204
2025-05-11 22:58:01,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2929.2485, 1602.53, 1876.2927, 2053.8428, 2262.4382, 1973.2117, 1712.1824, 1733.0544, 3090.5403, 2990.922]
2025-05-11 22:58:01,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 517.0, 614.0, 676.0, 725.0, 651.0, 577.0, 575.0, 919.0, 1000.0]
2025-05-11 22:58:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 40 minutes, 24 seconds)
2025-05-11 23:00:51,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:01:07,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2867.73413 ± 98.651
2025-05-11 23:01:07,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2930.9531, 2781.2273, 2887.2744, 3037.3909, 2748.1628, 2746.1863, 2872.6057, 3022.609, 2828.7166, 2822.2144]
2025-05-11 23:01:07,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 23:01:07,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (2867.73) for latency MM1Queue_a033_s075
2025-05-11 23:01:07,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:01:07,375 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:01:07,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 38 minutes, 59 seconds)
2025-05-11 23:04:05,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:04:13,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1714.51245 ± 502.762
2025-05-11 23:04:14,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1176.7698, 1956.8735, 1468.2793, 2139.9844, 2872.561, 1065.19, 1842.4802, 1385.34, 1491.5657, 1746.0802]
2025-05-11 23:04:14,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [388.0, 657.0, 462.0, 718.0, 1000.0, 366.0, 534.0, 467.0, 540.0, 598.0]
2025-05-11 23:04:14,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 38 minutes, 47 seconds)
2025-05-11 23:07:03,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:07:09,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1360.48621 ± 282.430
2025-05-11 23:07:09,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1342.3838, 1262.471, 927.79095, 1500.2301, 1314.8218, 1255.456, 1254.6204, 2104.0625, 1340.3711, 1302.6545]
2025-05-11 23:07:09,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [421.0, 399.0, 317.0, 481.0, 389.0, 388.0, 388.0, 626.0, 413.0, 394.0]
2025-05-11 23:07:09,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 34 minutes, 55 seconds)
2025-05-11 23:09:55,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:10:03,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1687.53979 ± 450.398
2025-05-11 23:10:03,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2383.4595, 1417.0166, 1291.5911, 2088.9114, 1420.0392, 1664.539, 2223.6257, 1047.016, 1232.088, 2107.1125]
2025-05-11 23:10:03,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [796.0, 456.0, 440.0, 672.0, 467.0, 538.0, 749.0, 336.0, 415.0, 706.0]
2025-05-11 23:10:04,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 31 minutes, 11 seconds)
2025-05-11 23:13:01,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:13:11,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1981.77380 ± 655.214
2025-05-11 23:13:11,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2954.6663, 1419.5303, 1341.9137, 1240.6858, 2124.7622, 2084.6016, 1758.7673, 2483.1963, 3110.379, 1299.2343]
2025-05-11 23:13:11,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 443.0, 423.0, 431.0, 702.0, 697.0, 575.0, 739.0, 954.0, 426.0]
2025-05-11 23:13:11,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 28 minutes, 33 seconds)
2025-05-11 23:16:02,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:16:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1030.46948 ± 267.275
2025-05-11 23:16:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1134.1857, 1141.5507, 778.6328, 966.76184, 1009.4939, 862.31494, 714.41986, 977.61115, 1730.9309, 988.79266]
2025-05-11 23:16:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [443.0, 381.0, 267.0, 314.0, 336.0, 297.0, 281.0, 324.0, 592.0, 332.0]
2025-05-11 23:16:07,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 24 minutes, 3 seconds)
2025-05-11 23:18:55,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:19:02,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1601.24536 ± 527.130
2025-05-11 23:19:02,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [251.38034, 1394.2085, 1203.4648, 2049.5042, 2044.712, 1759.1835, 1541.4098, 1944.1824, 2007.8945, 1816.5132]
2025-05-11 23:19:02,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 452.0, 407.0, 637.0, 587.0, 522.0, 478.0, 609.0, 603.0, 574.0]
2025-05-11 23:19:02,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 19 minutes, 15 seconds)
2025-05-11 23:21:57,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:22:03,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1509.79932 ± 550.044
2025-05-11 23:22:03,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [808.4065, 1526.6595, 1326.1849, 1509.5377, 1543.526, 1225.5492, 1282.5387, 3034.4155, 1533.6841, 1307.4916]
2025-05-11 23:22:03,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 426.0, 406.0, 475.0, 472.0, 390.0, 404.0, 898.0, 476.0, 399.0]
2025-05-11 23:22:03,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 17 minutes, 9 seconds)
2025-05-11 23:24:53,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:25:03,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2262.98047 ± 429.261
2025-05-11 23:25:03,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1805.9565, 2744.117, 2585.6501, 1646.755, 2976.9395, 2051.8853, 1868.2784, 2656.403, 2031.5322, 2262.2886]
2025-05-11 23:25:03,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [546.0, 828.0, 775.0, 480.0, 890.0, 626.0, 542.0, 730.0, 633.0, 707.0]
2025-05-11 23:25:03,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 14 minutes, 52 seconds)
2025-05-11 23:27:53,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:28:00,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1553.11987 ± 490.686
2025-05-11 23:28:00,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1754.2816, 1061.9006, 1696.4039, 1229.0402, 1482.1233, 806.6813, 2653.0237, 1760.3093, 1848.1079, 1239.3276]
2025-05-11 23:28:00,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [532.0, 359.0, 510.0, 383.0, 483.0, 279.0, 811.0, 521.0, 613.0, 402.0]
2025-05-11 23:28:00,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 10 minutes, 23 seconds)
2025-05-11 23:30:55,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:31:01,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1439.43811 ± 625.563
2025-05-11 23:31:01,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1054.7928, 1271.9296, 1071.7356, 1428.9321, 562.15186, 1400.5797, 2185.8435, 1173.7062, 1325.3799, 2919.329]
2025-05-11 23:31:01,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 375.0, 342.0, 434.0, 197.0, 423.0, 591.0, 376.0, 396.0, 832.0]
2025-05-11 23:31:01,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 8 minutes, 6 seconds)
2025-05-11 23:33:54,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:34:01,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1577.74084 ± 745.855
2025-05-11 23:34:01,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3118.5269, 519.52606, 1875.1183, 2474.4072, 1208.4757, 796.2471, 1849.096, 1214.8677, 1105.6312, 1615.5127]
2025-05-11 23:34:01,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [876.0, 201.0, 535.0, 701.0, 373.0, 282.0, 546.0, 384.0, 357.0, 451.0]
2025-05-11 23:34:01,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 5 minutes, 50 seconds)
2025-05-11 23:36:53,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:37:06,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2483.87451 ± 600.920
2025-05-11 23:37:06,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3115.8203, 2966.3877, 2516.2163, 2263.0435, 1512.0317, 1878.836, 2938.7864, 2920.3606, 3135.6838, 1591.5791]
2025-05-11 23:37:06,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 934.0, 808.0, 700.0, 524.0, 619.0, 1000.0, 1000.0, 1000.0, 486.0]
2025-05-11 23:37:06,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 3 minutes, 19 seconds)
2025-05-11 23:39:55,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:40:03,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1819.67029 ± 574.981
2025-05-11 23:40:03,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1105.2223, 2733.719, 1455.2777, 2337.9058, 1535.5062, 2886.0615, 1507.9452, 1645.191, 1528.8816, 1460.9927]
2025-05-11 23:40:03,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 832.0, 443.0, 699.0, 482.0, 844.0, 437.0, 543.0, 487.0, 430.0]
2025-05-11 23:40:03,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 6 seconds)
2025-05-11 23:43:02,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:43:12,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2085.73828 ± 692.116
2025-05-11 23:43:12,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2426.3174, 1119.9309, 2483.3525, 2774.6167, 1952.2404, 1752.0414, 817.09985, 3103.226, 2602.7817, 1825.7778]
2025-05-11 23:43:12,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [705.0, 369.0, 786.0, 821.0, 571.0, 527.0, 278.0, 882.0, 767.0, 560.0]
2025-05-11 23:43:12,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 58 minutes, 34 seconds)
2025-05-11 23:46:07,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:46:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1935.60608 ± 590.147
2025-05-11 23:46:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2146.1099, 1412.3865, 1877.5939, 2390.3027, 1515.6606, 1439.1259, 1410.1299, 3434.17, 1864.6455, 1865.9348]
2025-05-11 23:46:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [626.0, 423.0, 532.0, 691.0, 480.0, 433.0, 459.0, 997.0, 582.0, 549.0]
2025-05-11 23:46:16,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 55 minutes, 50 seconds)
2025-05-11 23:48:58,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:49:04,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1463.23340 ± 524.772
2025-05-11 23:49:04,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2182.3406, 1461.6404, 1471.2328, 96.72503, 1660.6256, 1559.9385, 1412.4175, 1436.0374, 2012.8965, 1338.4802]
2025-05-11 23:49:04,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [598.0, 432.0, 435.0, 78.0, 527.0, 455.0, 427.0, 417.0, 578.0, 397.0]
2025-05-11 23:49:04,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 51 minutes, 23 seconds)
2025-05-11 23:52:06,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:52:13,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1702.35095 ± 506.045
2025-05-11 23:52:13,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1953.302, 1794.9312, 1851.3105, 1213.3944, 1245.4924, 3013.467, 1305.1765, 1438.3663, 1767.5748, 1440.4938]
2025-05-11 23:52:13,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [536.0, 548.0, 558.0, 394.0, 348.0, 826.0, 402.0, 445.0, 475.0, 416.0]
2025-05-11 23:52:13,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 48 minutes, 51 seconds)
2025-05-11 23:54:56,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:55:04,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1767.54272 ± 680.103
2025-05-11 23:55:04,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1221.3818, 2760.6409, 1297.5642, 1071.9784, 1383.768, 2348.4873, 1996.1864, 3035.9983, 1363.8997, 1195.5228]
2025-05-11 23:55:04,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [373.0, 859.0, 387.0, 356.0, 423.0, 696.0, 580.0, 857.0, 419.0, 395.0]
2025-05-11 23:55:04,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 45 minutes, 1 second)
2025-05-11 23:58:00,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:58:06,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1373.28491 ± 409.764
2025-05-11 23:58:06,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1742.3634, 1141.7471, 1114.9125, 2369.6782, 1076.1553, 1248.4442, 1226.8843, 1679.7942, 961.2912, 1171.578]
2025-05-11 23:58:06,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [537.0, 335.0, 325.0, 687.0, 334.0, 380.0, 410.0, 507.0, 301.0, 341.0]
2025-05-11 23:58:06,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 41 minutes, 19 seconds)
2025-05-12 00:01:08,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:01:23,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2838.76270 ± 834.797
2025-05-12 00:01:23,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3066.6846, 3053.2617, 3175.9229, 354.91266, 3297.2957, 2869.5784, 3106.8618, 3144.8755, 3212.4224, 3105.8125]
2025-05-12 00:01:23,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 960.0, 1000.0, 149.0, 1000.0, 825.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:01:23,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 39 minutes, 47 seconds)
2025-05-12 00:04:04,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:04:20,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3031.38452 ± 77.702
2025-05-12 00:04:20,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2926.6084, 3129.066, 3030.7812, 3001.7803, 3043.031, 3017.9111, 3004.2139, 3100.6284, 2901.2175, 3158.6072]
2025-05-12 00:04:20,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:04:20,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (3031.38) for latency MM1Queue_a033_s075
2025-05-12 00:04:20,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:04:20,910 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 00:04:20,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 37 minutes, 42 seconds)
2025-05-12 00:07:13,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:07:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1939.79333 ± 271.050
2025-05-12 00:07:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1911.4457, 1506.378, 2068.8733, 1932.9257, 1930.542, 2268.6357, 2361.0454, 1450.6794, 2000.3925, 1967.0164]
2025-05-12 00:07:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [591.0, 481.0, 664.0, 627.0, 621.0, 670.0, 730.0, 464.0, 607.0, 620.0]
2025-05-12 00:07:22,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 33 minutes, 58 seconds)
2025-05-12 00:10:29,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:10:36,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1667.30566 ± 500.705
2025-05-12 00:10:36,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [868.36115, 1749.4542, 1596.3481, 1466.7828, 2881.924, 1424.0737, 1553.2611, 2011.7598, 1307.7252, 1813.3661]
2025-05-12 00:10:36,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 535.0, 546.0, 448.0, 825.0, 452.0, 455.0, 603.0, 383.0, 543.0]
2025-05-12 00:10:36,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 33 minutes, 15 seconds)
2025-05-12 00:13:17,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:13:28,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2431.19434 ± 551.464
2025-05-12 00:13:28,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3288.3613, 1870.505, 2837.6255, 3435.012, 2091.066, 2334.2124, 1790.4597, 2420.7175, 2336.7354, 1907.2488]
2025-05-12 00:13:28,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 530.0, 834.0, 1000.0, 635.0, 682.0, 525.0, 714.0, 653.0, 551.0]
2025-05-12 00:13:28,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 29 minutes, 6 seconds)
2025-05-12 00:16:21,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:16:27,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1307.40515 ± 252.200
2025-05-12 00:16:27,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1055.1149, 1386.9867, 1125.3959, 1914.1893, 1173.8857, 1553.597, 1378.0522, 1053.6904, 1231.5094, 1201.6299]
2025-05-12 00:16:27,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [360.0, 417.0, 345.0, 599.0, 378.0, 469.0, 401.0, 331.0, 406.0, 355.0]
2025-05-12 00:16:27,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 24 minutes, 23 seconds)
2025-05-12 00:19:20,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:19:27,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1722.36523 ± 345.211
2025-05-12 00:19:27,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1648.3691, 1648.1123, 2345.9666, 2155.156, 1857.8201, 1171.1293, 1670.2863, 1694.8746, 1827.1392, 1204.7979]
2025-05-12 00:19:27,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [484.0, 523.0, 681.0, 639.0, 543.0, 358.0, 501.0, 462.0, 486.0, 339.0]
2025-05-12 00:19:27,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 21 minutes, 36 seconds)
2025-05-12 00:22:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:22:41,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3277.07544 ± 70.831
2025-05-12 00:22:41,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3237.4133, 3332.814, 3263.5117, 3365.563, 3166.6616, 3215.925, 3227.6316, 3244.4978, 3309.2593, 3407.4744]
2025-05-12 00:22:41,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 976.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:22:41,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (3277.08) for latency MM1Queue_a033_s075
2025-05-12 00:22:41,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:22:41,993 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 00:22:42,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 19 minutes, 40 seconds)
2025-05-12 00:25:28,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:25:33,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1299.90991 ± 170.137
2025-05-12 00:25:33,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1450.1571, 1542.2654, 1227.4109, 1076.1178, 1607.6805, 1312.5143, 1249.3774, 1194.882, 1097.7318, 1240.9626]
2025-05-12 00:25:33,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [455.0, 481.0, 386.0, 335.0, 447.0, 404.0, 395.0, 385.0, 350.0, 396.0]
2025-05-12 00:25:33,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 14 minutes, 44 seconds)
2025-05-12 00:28:42,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:28:55,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2315.62891 ± 686.887
2025-05-12 00:28:55,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2951.7598, 2389.6182, 2688.2217, 3599.6077, 1695.0858, 2576.2893, 2288.7075, 2270.9836, 1037.9608, 1658.0546]
2025-05-12 00:28:55,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [874.0, 665.0, 781.0, 1000.0, 521.0, 753.0, 635.0, 623.0, 294.0, 500.0]
2025-05-12 00:28:55,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 14 minutes, 11 seconds)
2025-05-12 00:31:51,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:32:02,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2583.25317 ± 634.948
2025-05-12 00:32:02,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2311.8904, 1759.0768, 2894.0767, 2223.8079, 2200.059, 1985.8201, 3777.6665, 3500.6655, 2964.4053, 2215.062]
2025-05-12 00:32:02,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [628.0, 515.0, 805.0, 616.0, 591.0, 550.0, 1000.0, 1000.0, 813.0, 599.0]
2025-05-12 00:32:02,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 11 minutes, 40 seconds)
2025-05-12 00:35:04,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:35:12,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2045.65845 ± 656.932
2025-05-12 00:35:12,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1938.217, 1933.1398, 2126.8857, 3445.2627, 1265.5999, 1515.8429, 1905.0798, 2898.1946, 2199.1396, 1229.2201]
2025-05-12 00:35:12,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [562.0, 537.0, 621.0, 1000.0, 348.0, 440.0, 537.0, 798.0, 639.0, 357.0]
2025-05-12 00:35:12,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 9 minutes, 17 seconds)
2025-05-12 00:38:02,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:38:15,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2927.76294 ± 513.984
2025-05-12 00:38:15,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3484.7493, 3388.52, 3348.9087, 2563.1663, 3300.2034, 2132.0098, 2732.1775, 3174.3054, 1988.7277, 3164.8616]
2025-05-12 00:38:15,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 962.0, 1000.0, 715.0, 1000.0, 663.0, 818.0, 1000.0, 607.0, 914.0]
2025-05-12 00:38:15,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 5 minutes, 21 seconds)
2025-05-12 00:41:07,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:41:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1652.06250 ± 357.438
2025-05-12 00:41:14,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1749.5024, 2579.967, 1256.9575, 1621.8975, 1530.3658, 1446.5638, 1911.6437, 1348.0647, 1477.9723, 1597.6898]
2025-05-12 00:41:14,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [485.0, 727.0, 366.0, 465.0, 428.0, 423.0, 500.0, 413.0, 427.0, 461.0]
2025-05-12 00:41:14,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 2 minutes, 41 seconds)
2025-05-12 00:43:54,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:44:06,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2521.39673 ± 690.400
2025-05-12 00:44:06,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1334.3064, 2588.951, 1974.322, 3340.8584, 3406.8364, 2121.3926, 3307.7583, 1832.0928, 2260.3813, 3047.0679]
2025-05-12 00:44:06,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [439.0, 741.0, 599.0, 988.0, 1000.0, 627.0, 943.0, 571.0, 687.0, 883.0]
2025-05-12 00:44:06,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 57 minutes, 41 seconds)
2025-05-12 00:47:08,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:47:17,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2173.71265 ± 851.810
2025-05-12 00:47:17,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1813.9161, 2690.0288, 3364.014, 2369.0222, 2161.4236, 2377.3071, 2786.865, 1754.837, -5.413, 2425.125]
2025-05-12 00:47:17,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [534.0, 717.0, 1000.0, 709.0, 577.0, 679.0, 777.0, 533.0, 19.0, 708.0]
2025-05-12 00:47:17,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 54 minutes, 53 seconds)
2025-05-12 00:49:59,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:50:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2495.52075 ± 701.126
2025-05-12 00:50:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1893.3019, 1466.5708, 3212.7556, 2752.8604, 2535.9338, 1602.5284, 3283.2825, 2284.9397, 3674.061, 2248.9746]
2025-05-12 00:50:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [532.0, 479.0, 933.0, 798.0, 730.0, 475.0, 1000.0, 629.0, 1000.0, 646.0]
2025-05-12 00:50:09,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 50 minutes, 51 seconds)
2025-05-12 00:53:04,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:53:16,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2788.80127 ± 739.358
2025-05-12 00:53:16,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2818.3545, 1779.726, 2819.7393, 2732.7542, 3579.303, 1570.1058, 3533.3074, 1982.914, 3454.933, 3616.8774]
2025-05-12 00:53:16,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [813.0, 579.0, 744.0, 783.0, 1000.0, 469.0, 1000.0, 563.0, 1000.0, 1000.0]
2025-05-12 00:53:16,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 2 seconds)
2025-05-12 00:56:09,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:56:22,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2923.36548 ± 629.788
2025-05-12 00:56:22,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3152.721, 3723.8953, 3589.2031, 1925.4756, 2150.1729, 2656.6729, 3414.375, 3446.1218, 2126.6194, 3048.3992]
2025-05-12 00:56:22,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [839.0, 1000.0, 1000.0, 542.0, 606.0, 722.0, 1000.0, 1000.0, 628.0, 863.0]
2025-05-12 00:56:22,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 45 minutes, 25 seconds)
2025-05-12 00:59:16,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:59:26,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2365.40894 ± 773.013
2025-05-12 00:59:26,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1822.0438, 3542.8752, 1356.4762, 2133.3303, 1455.0082, 3193.2678, 1820.0587, 2467.9678, 2315.5322, 3547.5315]
2025-05-12 00:59:26,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [510.0, 1000.0, 411.0, 570.0, 427.0, 868.0, 518.0, 701.0, 631.0, 1000.0]
2025-05-12 00:59:26,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 42 minutes, 55 seconds)
2025-05-12 01:02:18,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:02:29,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2449.09326 ± 830.861
2025-05-12 01:02:29,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3340.164, 2285.5947, 3311.0437, 2604.4656, 1599.7322, 1578.4568, 1722.361, 3450.4084, 3362.4692, 1236.2377]
2025-05-12 01:02:29,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 646.0, 938.0, 757.0, 477.0, 464.0, 536.0, 1000.0, 1000.0, 374.0]
2025-05-12 01:02:29,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 39 minutes, 32 seconds)
2025-05-12 01:05:26,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:05:37,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2371.78540 ± 974.981
2025-05-12 01:05:37,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2508.5146, 1675.8945, 2051.1294, 1135.5626, 3079.0005, 3254.5718, 3581.3115, 667.84906, 3658.3413, 2105.677]
2025-05-12 01:05:37,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [704.0, 463.0, 584.0, 353.0, 861.0, 911.0, 962.0, 218.0, 963.0, 605.0]
2025-05-12 01:05:37,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 37 minutes, 5 seconds)
2025-05-12 01:08:19,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:08:27,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1912.10120 ± 600.083
2025-05-12 01:08:27,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2222.712, 860.0586, 1783.7327, 1802.459, 1336.2561, 1725.257, 1748.7323, 3224.3164, 2406.3918, 2011.0961]
2025-05-12 01:08:27,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [619.0, 252.0, 514.0, 519.0, 421.0, 504.0, 482.0, 824.0, 692.0, 571.0]
2025-05-12 01:08:27,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 33 minutes, 24 seconds)
2025-05-12 01:11:15,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:11:23,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 1979.53149 ± 290.906
2025-05-12 01:11:23,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2064.9177, 1774.3843, 1506.3868, 2046.9934, 2304.401, 2049.9724, 1582.6757, 1825.9056, 2169.418, 2470.2605]
2025-05-12 01:11:23,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [584.0, 507.0, 443.0, 543.0, 651.0, 569.0, 471.0, 549.0, 613.0, 676.0]
2025-05-12 01:11:23,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 1 second)
2025-05-12 01:14:17,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:14:30,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3114.27197 ± 922.867
2025-05-12 01:14:30,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3685.857, 3832.257, 2321.4216, 3540.7856, 2714.1545, 706.6376, 3687.935, 3582.4626, 3580.1497, 3491.0593]
2025-05-12 01:14:30,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 665.0, 1000.0, 747.0, 234.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:14:30,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 7 seconds)
2025-05-12 01:17:46,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:18:06,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3399.51807 ± 156.459
2025-05-12 01:18:06,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3557.4731, 2968.9258, 3436.6943, 3414.024, 3551.8145, 3411.1228, 3422.7239, 3353.0537, 3482.0505, 3397.2954]
2025-05-12 01:18:06,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 822.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:18:06,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (3399.52) for latency MM1Queue_a033_s075
2025-05-12 01:18:06,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 01:18:06,125 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 01:18:06,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 57 seconds)
2025-05-12 01:21:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:21:38,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2461.16455 ± 807.504
2025-05-12 01:21:38,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3400.3955, 3597.3462, 3539.4407, 1623.549, 1698.3455, 1493.2083, 2886.4207, 1941.7959, 2658.9834, 1772.1606]
2025-05-12 01:21:38,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [874.0, 960.0, 1000.0, 450.0, 494.0, 432.0, 777.0, 537.0, 726.0, 512.0]
2025-05-12 01:21:38,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 25 seconds)
2025-05-12 01:24:43,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:24:56,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3080.52466 ± 754.261
2025-05-12 01:24:56,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3125.6772, 3693.0742, 3855.9707, 3762.9575, 3540.524, 3539.5269, 1577.7406, 2358.9302, 2077.406, 3273.4368]
2025-05-12 01:24:56,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [796.0, 1000.0, 948.0, 1000.0, 959.0, 1000.0, 454.0, 649.0, 566.0, 903.0]
2025-05-12 01:24:56,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 46 seconds)
2025-05-12 01:27:12,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:27:22,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3417.48438 ± 268.611
2025-05-12 01:27:22,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3439.1643, 3003.745, 3427.4019, 2839.192, 3717.4045, 3553.7417, 3392.543, 3588.9138, 3658.6343, 3554.1033]
2025-05-12 01:27:22,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 868.0, 1000.0, 777.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:27:22,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (3417.48) for latency MM1Queue_a033_s075
2025-05-12 01:27:22,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 01:27:22,239 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 01:27:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 59 seconds)
2025-05-12 01:29:08,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:29:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 2078.27734 ± 827.195
2025-05-12 01:29:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [1834.2458, 2433.3547, 1034.7314, 1716.8165, 3444.279, 2616.9773, 1821.962, 1752.6274, 3310.6692, 817.1099]
2025-05-12 01:29:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [520.0, 685.0, 293.0, 501.0, 1000.0, 712.0, 553.0, 530.0, 1000.0, 244.0]
2025-05-12 01:29:14,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 46 seconds)
2025-05-12 01:31:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:31:10,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3349.14600 ± 56.914
2025-05-12 01:31:10,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3275.8428, 3351.226, 3348.816, 3354.539, 3374.0325, 3337.6726, 3296.3098, 3319.461, 3334.89, 3498.6716]
2025-05-12 01:31:10,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:31:10,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 50 seconds)
2025-05-12 01:32:57,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:33:06,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3475.79541 ± 586.894
2025-05-12 01:33:06,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3652.1863, 3533.7344, 3705.363, 3702.7134, 3658.976, 3710.5613, 1725.6389, 3617.5842, 3793.265, 3657.9302]
2025-05-12 01:33:06,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 505.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:33:06,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1226 [INFO]: New best (3475.80) for latency MM1Queue_a033_s075
2025-05-12 01:33:06,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 01:33:06,566 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 01:33:06,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 35 seconds)
2025-05-12 01:34:47,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:34:57,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3266.78638 ± 414.186
2025-05-12 01:34:57,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [2053.6494, 3587.4128, 3420.6125, 3359.248, 3465.3806, 3469.4358, 3424.0957, 3285.6848, 3321.3152, 3281.0293]
2025-05-12 01:34:57,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [603.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:34:57,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes)
2025-05-12 01:36:49,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:36:58,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1221 [DEBUG]: Total Reward: 3248.30078 ± 649.190
2025-05-12 01:36:58,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1222 [DEBUG]: All rewards: [3410.3982, 3824.2424, 3630.4644, 3416.332, 2097.8225, 3404.7466, 3712.1204, 3554.393, 1862.0927, 3570.3965]
2025-05-12 01:36:58,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 574.0, 895.0, 1000.0, 1000.0, 552.0, 1000.0]
2025-05-12 01:36:58,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1251 [DEBUG]: Training session finished
