2025-08-07 09:22:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:22:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:22:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14826ddce1d0>}
2025-08-07 09:22:14,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 09:22:14,766 baseline-bpql-noiseperc15-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 09:22:14,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 09:22:14,783 baseline-bpql-noiseperc15-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 09:22:14,783 baseline-bpql-noiseperc15-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 09:22:15,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 09:22:15,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 09:23:47,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:23:59,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -385.42877 ± 30.645
2025-08-07 09:23:59,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-369.6396, -373.86487, -389.9465, -404.1665, -344.94995, -382.21448, -415.0933, -408.2581, -330.2786, -435.87607]
2025-08-07 09:23:59,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:23:59,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-385.43) for latency MM1Queue_a033_s075
2025-08-07 09:23:59,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 50 minutes, 39 seconds)
2025-08-07 09:25:36,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:25:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -108.14854 ± 71.150
2025-08-07 09:25:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-96.19606, -246.05348, -116.865654, -87.812416, -92.32443, -136.23389, 32.975628, -148.79372, -31.748978, -158.43248]
2025-08-07 09:25:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:25:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-108.15) for latency MM1Queue_a033_s075
2025-08-07 09:25:48,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 53 minutes, 25 seconds)
2025-08-07 09:27:25,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:27:37,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 0.89636 ± 96.617
2025-08-07 09:27:37,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-27.361433, 7.3287053, -44.158558, 8.47179, 123.0421, -101.52687, -11.773457, 143.44542, 97.970566, -186.47467]
2025-08-07 09:27:37,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:27:37,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (0.90) for latency MM1Queue_a033_s075
2025-08-07 09:27:37,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 53 minutes, 12 seconds)
2025-08-07 09:29:14,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:26,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 12.83213 ± 90.006
2025-08-07 09:29:26,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [91.89573, -120.83989, -78.76999, 63.30011, -36.596085, 141.03186, -74.66785, 104.579994, -54.70211, 93.08953]
2025-08-07 09:29:26,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:29:26,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (12.83) for latency MM1Queue_a033_s075
2025-08-07 09:29:26,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 52 minutes, 10 seconds)
2025-08-07 09:31:03,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:14,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 117.06722 ± 168.445
2025-08-07 09:31:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [327.77344, 60.12568, -287.81985, 99.78568, 235.5329, 83.892166, 326.77347, 195.44398, 88.063644, 41.101246]
2025-08-07 09:31:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:31:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (117.07) for latency MM1Queue_a033_s075
2025-08-07 09:31:14,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 50 minutes, 46 seconds)
2025-08-07 09:32:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:03,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 360.48785 ± 84.261
2025-08-07 09:33:03,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [225.75188, 334.0268, 327.19598, 304.34625, 323.0232, 313.92307, 528.15045, 365.46234, 480.5608, 402.4378]
2025-08-07 09:33:03,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:33:03,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (360.49) for latency MM1Queue_a033_s075
2025-08-07 09:33:03,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 50 minutes, 39 seconds)
2025-08-07 09:34:41,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:52,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 479.44586 ± 181.584
2025-08-07 09:34:52,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [543.3552, 632.70514, 166.91566, 515.1108, 461.56387, 705.166, 545.31824, 481.31268, 620.79614, 122.21465]
2025-08-07 09:34:52,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:34:52,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (479.45) for latency MM1Queue_a033_s075
2025-08-07 09:34:52,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 48 minutes, 49 seconds)
2025-08-07 09:36:30,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:41,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 629.54346 ± 114.803
2025-08-07 09:36:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [396.36914, 527.58044, 543.75696, 691.6111, 781.6246, 643.3122, 575.61743, 640.85114, 718.1481, 776.5638]
2025-08-07 09:36:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:36:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (629.54) for latency MM1Queue_a033_s075
2025-08-07 09:36:41,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 46 minutes, 54 seconds)
2025-08-07 09:38:18,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:29,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 612.99084 ± 92.786
2025-08-07 09:38:29,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [588.32434, 530.2772, 577.60034, 565.43835, 650.8614, 768.13727, 647.1695, 495.27466, 528.25165, 778.5742]
2025-08-07 09:38:29,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:38:29,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 44 minutes, 50 seconds)
2025-08-07 09:40:06,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:40:18,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 642.92517 ± 180.975
2025-08-07 09:40:18,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [584.55035, 839.1421, 782.0521, 623.23364, 809.7621, 521.2205, 370.29944, 614.25616, 370.26993, 914.46466]
2025-08-07 09:40:18,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:40:18,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (642.93) for latency MM1Queue_a033_s075
2025-08-07 09:40:18,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 43 minutes, 1 second)
2025-08-07 09:41:55,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:42:07,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 734.77307 ± 218.013
2025-08-07 09:42:07,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [726.9168, 142.19713, 754.9154, 964.6071, 832.87054, 843.6939, 874.847, 595.15045, 801.9834, 810.5492]
2025-08-07 09:42:07,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:42:07,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (734.77) for latency MM1Queue_a033_s075
2025-08-07 09:42:07,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 41 minutes, 10 seconds)
2025-08-07 09:43:44,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:55,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 845.25226 ± 157.515
2025-08-07 09:43:55,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [892.0058, 892.9628, 691.2309, 1235.4612, 866.2821, 777.41254, 783.2092, 665.43994, 711.2216, 937.2967]
2025-08-07 09:43:55,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:43:55,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (845.25) for latency MM1Queue_a033_s075
2025-08-07 09:43:55,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 39 minutes, 21 seconds)
2025-08-07 09:45:33,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:44,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 817.40143 ± 119.862
2025-08-07 09:45:44,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [623.1626, 765.5425, 1046.4337, 738.736, 954.31836, 711.1789, 801.3873, 786.31335, 932.9147, 814.027]
2025-08-07 09:45:44,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:45:44,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 37 minutes, 29 seconds)
2025-08-07 09:47:21,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:47:33,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 889.05743 ± 118.779
2025-08-07 09:47:33,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [944.38184, 809.87305, 937.54156, 876.72253, 980.7917, 978.54614, 822.9417, 1073.8386, 619.86334, 846.0737]
2025-08-07 09:47:33,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:47:33,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (889.06) for latency MM1Queue_a033_s075
2025-08-07 09:47:33,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 35 minutes, 51 seconds)
2025-08-07 09:49:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:49:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 914.06689 ± 65.665
2025-08-07 09:49:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [905.883, 891.4292, 938.7685, 855.23804, 940.1217, 981.45044, 953.0396, 929.47784, 991.076, 754.185]
2025-08-07 09:49:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:49:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (914.07) for latency MM1Queue_a033_s075
2025-08-07 09:49:22,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 34 minutes, 3 seconds)
2025-08-07 09:50:58,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:51:10,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 927.47284 ± 51.876
2025-08-07 09:51:10,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [937.063, 885.3719, 885.4133, 960.4003, 957.30597, 867.149, 1014.8427, 984.3137, 937.27563, 845.5926]
2025-08-07 09:51:10,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:51:10,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (927.47) for latency MM1Queue_a033_s075
2025-08-07 09:51:10,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 32 minutes, 2 seconds)
2025-08-07 09:52:46,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 959.26855 ± 59.778
2025-08-07 09:52:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [982.9774, 961.3598, 946.4248, 900.8486, 896.17883, 971.30145, 1073.7448, 1050.1923, 913.9184, 895.73956]
2025-08-07 09:52:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:52:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (959.27) for latency MM1Queue_a033_s075
2025-08-07 09:52:57,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 29 minutes, 52 seconds)
2025-08-07 09:54:33,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:44,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 985.98810 ± 102.495
2025-08-07 09:54:44,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1032.9033, 1031.5604, 923.24634, 998.9194, 853.4418, 873.4997, 1142.7196, 1162.11, 886.4593, 955.02167]
2025-08-07 09:54:44,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:54:44,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (985.99) for latency MM1Queue_a033_s075
2025-08-07 09:54:44,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 27 minutes, 42 seconds)
2025-08-07 09:56:20,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:56:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1073.75354 ± 135.537
2025-08-07 09:56:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1107.921, 876.17004, 1315.019, 1056.0995, 1229.4934, 913.1013, 917.8201, 1069.9968, 1075.3046, 1176.6099]
2025-08-07 09:56:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:56:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1073.75) for latency MM1Queue_a033_s075
2025-08-07 09:56:31,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 25 minutes, 26 seconds)
2025-08-07 09:58:07,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1067.88025 ± 110.419
2025-08-07 09:58:18,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1006.6285, 1071.1235, 1020.4547, 1023.6113, 1361.6733, 990.5213, 1055.1372, 960.406, 1029.3901, 1159.8579]
2025-08-07 09:58:18,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:58:18,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 23 minutes, 1 second)
2025-08-07 09:59:53,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:04,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1178.49048 ± 232.646
2025-08-07 10:00:04,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [985.19415, 1051.8068, 1556.393, 1605.9574, 1070.3881, 1017.39, 1150.6177, 1106.747, 1356.7777, 883.6325]
2025-08-07 10:00:04,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:00:04,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1178.49) for latency MM1Queue_a033_s075
2025-08-07 10:00:04,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 20 minutes, 45 seconds)
2025-08-07 10:01:40,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:01:51,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1013.34338 ± 126.136
2025-08-07 10:01:51,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1105.9656, 1100.2134, 820.3216, 874.0789, 1176.0665, 1029.8839, 1033.2167, 811.1729, 1146.7709, 1035.7435]
2025-08-07 10:01:51,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:01:51,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 18 minutes, 44 seconds)
2025-08-07 10:03:26,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:03:37,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1229.38171 ± 210.745
2025-08-07 10:03:37,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [939.69727, 1463.6609, 1040.4678, 1208.9962, 1160.1772, 1239.1918, 1554.1819, 1415.2233, 1367.3114, 904.90936]
2025-08-07 10:03:37,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:03:37,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1229.38) for latency MM1Queue_a033_s075
2025-08-07 10:03:37,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 16 minutes, 45 seconds)
2025-08-07 10:05:12,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:05:24,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1200.43726 ± 181.914
2025-08-07 10:05:24,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1004.24506, 1312.7595, 1069.296, 1147.8822, 1462.563, 1110.0687, 1467.0583, 1063.789, 1398.5769, 968.1342]
2025-08-07 10:05:24,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:05:24,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 14 minutes, 51 seconds)
2025-08-07 10:06:59,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:07:10,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1069.78870 ± 112.885
2025-08-07 10:07:10,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1039.5757, 1058.2351, 1000.52057, 1083.5271, 1037.2919, 1150.5742, 920.9548, 940.334, 1336.7404, 1130.133]
2025-08-07 10:07:10,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:07:10,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 13 minutes, 4 seconds)
2025-08-07 10:08:45,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:57,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1221.82837 ± 174.668
2025-08-07 10:08:57,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1181.8513, 1364.6516, 865.3474, 1236.3292, 1401.4192, 1232.1107, 1028.8259, 1493.1511, 1129.3335, 1285.2639]
2025-08-07 10:08:57,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:08:57,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 11 minutes, 20 seconds)
2025-08-07 10:10:31,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:43,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1460.00427 ± 292.752
2025-08-07 10:10:43,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1083.7025, 1740.3076, 1165.452, 1856.1337, 1240.1187, 1740.4009, 1595.8563, 1606.1909, 998.02527, 1573.855]
2025-08-07 10:10:43,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:10:43,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1460.00) for latency MM1Queue_a033_s075
2025-08-07 10:10:43,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 9 minutes, 26 seconds)
2025-08-07 10:12:17,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:28,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1171.32019 ± 128.947
2025-08-07 10:12:28,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [953.8824, 1195.8564, 1061.7025, 1254.0834, 1393.6561, 1104.0376, 1276.1146, 1301.8883, 1047.4279, 1124.554]
2025-08-07 10:12:28,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:12:28,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 7 minutes, 25 seconds)
2025-08-07 10:14:03,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:14,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1204.94788 ± 220.693
2025-08-07 10:14:14,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1115.7296, 1067.0956, 1523.7587, 1060.2052, 1190.1514, 797.62115, 1067.8899, 1395.272, 1298.8845, 1532.8699]
2025-08-07 10:14:14,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:14:14,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 5 minutes, 28 seconds)
2025-08-07 10:15:48,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:59,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1289.09851 ± 205.454
2025-08-07 10:15:59,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1077.0991, 1386.1935, 1364.0957, 1014.9405, 1153.165, 1606.0363, 1058.7272, 1198.902, 1471.5887, 1560.2379]
2025-08-07 10:15:59,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:15:59,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 3 minutes, 24 seconds)
2025-08-07 10:17:34,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1214.59192 ± 201.439
2025-08-07 10:17:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1025.4103, 1051.1208, 1317.7019, 1310.8726, 1147.8167, 1030.2421, 1155.3373, 1650.0184, 1014.91046, 1442.4895]
2025-08-07 10:17:45,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:17:45,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 1 minute, 34 seconds)
2025-08-07 10:19:20,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1369.36060 ± 239.788
2025-08-07 10:19:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1376.4507, 1498.0232, 1375.1212, 1333.6654, 1220.4865, 1266.6721, 1148.5975, 1741.9565, 955.96173, 1776.6705]
2025-08-07 10:19:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:19:31,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 59 minutes, 45 seconds)
2025-08-07 10:21:06,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:17,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1256.32874 ± 253.625
2025-08-07 10:21:17,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1072.6615, 1486.8534, 1047.5839, 1373.4436, 972.70844, 1605.8486, 998.5409, 1232.9331, 1073.5731, 1699.1407]
2025-08-07 10:21:17,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:21:17,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 58 minutes, 5 seconds)
2025-08-07 10:22:52,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:03,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1347.76233 ± 129.650
2025-08-07 10:23:03,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1316.4532, 1241.3427, 1324.1226, 1491.0826, 1573.3173, 1217.588, 1334.0854, 1383.7913, 1471.7183, 1124.1206]
2025-08-07 10:23:03,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:23:03,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 56 minutes, 21 seconds)
2025-08-07 10:24:37,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:49,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1459.67896 ± 339.511
2025-08-07 10:24:49,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [979.7889, 1334.5201, 1071.6606, 1521.1472, 1552.5864, 1046.9303, 1764.106, 1427.2472, 1947.6527, 1951.1505]
2025-08-07 10:24:49,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:24:49,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 54 minutes, 44 seconds)
2025-08-07 10:26:23,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:35,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1400.56177 ± 232.508
2025-08-07 10:26:35,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1421.2866, 967.30994, 1759.4718, 1130.7665, 1558.1237, 1536.544, 1583.9364, 1192.8013, 1306.6982, 1548.6782]
2025-08-07 10:26:35,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:26:35,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 52 minutes, 57 seconds)
2025-08-07 10:28:09,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:20,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1356.86499 ± 440.992
2025-08-07 10:28:20,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [972.11475, 986.4741, 1136.3729, 1371.7816, 2010.2214, 945.37164, 2011.7118, 1994.5586, 985.06055, 1154.982]
2025-08-07 10:28:20,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:28:20,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 51 minutes, 3 seconds)
2025-08-07 10:29:55,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:06,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1343.24146 ± 344.721
2025-08-07 10:30:06,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1557.1111, 2020.5875, 1101.6718, 938.36554, 1102.3705, 1853.7335, 1155.725, 1435.5743, 1208.9127, 1058.3624]
2025-08-07 10:30:06,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:30:06,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 49 minutes, 15 seconds)
2025-08-07 10:31:40,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:51,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1302.89966 ± 287.513
2025-08-07 10:31:51,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1106.4769, 1185.4745, 1270.3329, 1021.9269, 1053.4017, 1416.4622, 1666.2927, 1008.07245, 1368.5048, 1932.053]
2025-08-07 10:31:51,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:31:51,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 47 minutes, 29 seconds)
2025-08-07 10:33:26,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:37,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1398.35132 ± 328.739
2025-08-07 10:33:37,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1791.3905, 1763.5693, 1480.5724, 1123.157, 937.326, 1070.9578, 1552.3337, 1227.2938, 1125.5276, 1911.3846]
2025-08-07 10:33:37,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:33:37,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 45 minutes, 44 seconds)
2025-08-07 10:35:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:23,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1343.76758 ± 197.130
2025-08-07 10:35:23,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1154.9884, 1449.0729, 1293.7998, 1413.2535, 1278.7275, 1797.8408, 1383.3438, 1436.65, 1046.8993, 1183.1008]
2025-08-07 10:35:23,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:35:23,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 43 minutes, 56 seconds)
2025-08-07 10:36:58,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:09,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1304.10510 ± 315.406
2025-08-07 10:37:09,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1729.5647, 1905.7727, 1029.2844, 1510.0786, 1041.5839, 1282.1865, 1455.8999, 994.1827, 1124.8907, 967.60754]
2025-08-07 10:37:09,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:37:09,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 42 minutes, 14 seconds)
2025-08-07 10:38:43,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:54,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1422.07849 ± 388.467
2025-08-07 10:38:54,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2103.877, 1755.7056, 1890.1442, 1142.4532, 1104.9027, 1721.0127, 1246.773, 978.2187, 969.3046, 1308.3932]
2025-08-07 10:38:54,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:38:54,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 40 minutes, 28 seconds)
2025-08-07 10:40:29,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:40,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1632.10071 ± 239.568
2025-08-07 10:40:40,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1283.2894, 2041.2413, 1665.1917, 1485.02, 1505.5829, 1950.7812, 1735.2449, 1792.0325, 1313.5791, 1549.0438]
2025-08-07 10:40:40,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:40:40,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1632.10) for latency MM1Queue_a033_s075
2025-08-07 10:40:40,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 38 minutes, 42 seconds)
2025-08-07 10:42:15,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1314.49060 ± 385.796
2025-08-07 10:42:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1201.5537, 1040.8452, 1991.6974, 1158.9222, 1215.2173, 2142.9797, 1214.4907, 995.0805, 1165.2521, 1018.8671]
2025-08-07 10:42:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:42:26,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 36 minutes, 57 seconds)
2025-08-07 10:44:01,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:12,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1466.60999 ± 383.617
2025-08-07 10:44:12,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2316.4229, 1150.3867, 1364.9773, 1712.4436, 1244.8132, 1279.0608, 1007.1983, 1112.4784, 1807.2548, 1671.0632]
2025-08-07 10:44:12,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:44:12,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 35 minutes, 15 seconds)
2025-08-07 10:45:47,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:58,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1184.19238 ± 199.753
2025-08-07 10:45:58,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1593.0178, 1028.4948, 1046.1995, 994.4449, 1331.4242, 981.51263, 1094.8785, 1220.9413, 1454.1465, 1096.8629]
2025-08-07 10:45:58,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:45:58,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 33 minutes, 33 seconds)
2025-08-07 10:47:33,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1449.00806 ± 359.973
2025-08-07 10:47:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1014.99304, 2227.802, 1380.9766, 1234.3142, 1290.5137, 1477.8668, 1577.3474, 1790.3153, 1572.359, 923.5928]
2025-08-07 10:47:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:47:44,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 31 minutes, 47 seconds)
2025-08-07 10:49:19,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:30,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1312.97900 ± 309.922
2025-08-07 10:49:30,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1033.2957, 1825.4241, 999.29083, 1230.5446, 1017.8111, 1389.9155, 1379.6492, 1036.0409, 1896.28, 1321.5377]
2025-08-07 10:49:30,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:49:30,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 30 minutes, 3 seconds)
2025-08-07 10:51:04,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:16,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1349.38342 ± 336.421
2025-08-07 10:51:16,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1124.5973, 1519.5803, 1158.9153, 1662.1866, 2190.1733, 1253.7048, 1080.5048, 1202.4348, 1272.0735, 1029.6633]
2025-08-07 10:51:16,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:51:16,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 28 minutes, 13 seconds)
2025-08-07 10:52:50,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:02,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1455.22327 ± 432.267
2025-08-07 10:53:02,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1877.1389, 1241.7311, 1850.5845, 1093.612, 1107.4932, 1583.5396, 1112.2366, 2391.6882, 1281.3123, 1012.89655]
2025-08-07 10:53:02,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:53:02,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 26 minutes, 26 seconds)
2025-08-07 10:54:36,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:48,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1512.55872 ± 433.310
2025-08-07 10:54:48,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1325.4735, 1280.1802, 2303.395, 1593.0137, 2300.5046, 983.5307, 1540.9297, 1234.7681, 1092.7358, 1471.056]
2025-08-07 10:54:48,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:54:48,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 24 minutes, 42 seconds)
2025-08-07 10:56:22,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:33,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1365.99512 ± 315.408
2025-08-07 10:56:33,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1169.8329, 1414.6663, 1365.1831, 1053.9902, 2112.909, 1265.8678, 1279.1973, 1762.4342, 1139.7625, 1096.1079]
2025-08-07 10:56:33,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:56:33,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 22 minutes, 55 seconds)
2025-08-07 10:58:08,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:19,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1462.10388 ± 458.861
2025-08-07 10:58:19,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [994.6283, 1169.8925, 1054.252, 1279.8224, 1965.2927, 1055.2852, 2078.0789, 1217.4333, 2324.4626, 1481.8915]
2025-08-07 10:58:19,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:58:19,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 21 minutes, 8 seconds)
2025-08-07 10:59:54,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:05,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1799.29028 ± 448.646
2025-08-07 11:00:05,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1181.904, 1994.8927, 2136.759, 1203.958, 1359.7571, 1738.4385, 2525.2412, 1918.4597, 1559.9762, 2373.516]
2025-08-07 11:00:05,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:00:05,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1799.29) for latency MM1Queue_a033_s075
2025-08-07 11:00:05,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 22 seconds)
2025-08-07 11:01:39,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:51,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1344.26343 ± 351.887
2025-08-07 11:01:51,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1287.33, 2051.0166, 1363.817, 1051.3241, 1695.6099, 1474.8605, 798.74756, 1204.7236, 1554.8663, 960.33844]
2025-08-07 11:01:51,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:01:51,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 17 minutes, 35 seconds)
2025-08-07 11:03:25,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:36,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1481.05640 ± 313.436
2025-08-07 11:03:36,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1466.4479, 1052.7341, 1359.6598, 1204.3705, 1756.1937, 1733.7523, 1749.1497, 1197.6132, 2070.5962, 1220.0455]
2025-08-07 11:03:36,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:03:36,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 46 seconds)
2025-08-07 11:05:11,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:22,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1388.59753 ± 340.476
2025-08-07 11:05:22,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [991.6904, 1951.2063, 1519.2395, 1194.2404, 966.97974, 1794.5837, 1457.6206, 1598.2008, 912.472, 1499.7423]
2025-08-07 11:05:22,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:05:22,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 14 minutes, 3 seconds)
2025-08-07 11:06:57,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:08,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1493.30725 ± 610.260
2025-08-07 11:07:08,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1205.6364, 1144.3656, 1333.449, 676.3524, 2684.5046, 1149.1996, 1142.8502, 1180.9269, 2053.5273, 2362.2603]
2025-08-07 11:07:08,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:07:08,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 17 seconds)
2025-08-07 11:08:43,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:54,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1311.50696 ± 325.355
2025-08-07 11:08:54,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1280.0521, 1232.496, 1149.3561, 1009.36786, 2188.0059, 1481.0349, 1414.9484, 1137.0297, 1199.4685, 1023.30914]
2025-08-07 11:08:54,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:08:54,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 10 minutes, 33 seconds)
2025-08-07 11:10:29,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:40,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1537.54956 ± 459.905
2025-08-07 11:10:40,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1675.6854, 1398.794, 1196.5187, 1497.0259, 1424.4537, 1035.9369, 1036.6027, 1376.6176, 2222.282, 2511.5784]
2025-08-07 11:10:40,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:10:40,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 8 minutes, 47 seconds)
2025-08-07 11:12:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:26,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1588.34436 ± 460.445
2025-08-07 11:12:26,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1363.1589, 1202.1296, 1565.7126, 2079.9717, 2471.709, 975.5483, 1732.5576, 1491.5732, 1012.3884, 1988.6937]
2025-08-07 11:12:26,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:12:26,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 4 seconds)
2025-08-07 11:14:00,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:11,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1594.32617 ± 633.240
2025-08-07 11:14:11,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1084.3783, 912.1155, 1376.8224, 2900.0417, 1447.1769, 1446.701, 1142.5172, 1260.708, 2672.348, 1700.4532]
2025-08-07 11:14:11,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:14:11,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 5 minutes, 13 seconds)
2025-08-07 11:15:46,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:57,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1900.26208 ± 673.264
2025-08-07 11:15:57,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1489.3162, 2584.3242, 957.4354, 2118.2803, 1273.0338, 1811.3599, 2900.7627, 2562.1316, 946.9823, 2358.9932]
2025-08-07 11:15:57,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:15:57,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1900.26) for latency MM1Queue_a033_s075
2025-08-07 11:15:57,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 27 seconds)
2025-08-07 11:17:32,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:43,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1502.48767 ± 341.716
2025-08-07 11:17:43,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2001.4927, 1864.9238, 1525.8204, 1910.5709, 1234.613, 997.1503, 1674.9247, 1279.277, 1040.3964, 1495.7081]
2025-08-07 11:17:43,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:17:43,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 1 minute, 42 seconds)
2025-08-07 11:19:16,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:27,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1591.27710 ± 365.161
2025-08-07 11:19:27,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2175.5063, 1886.3722, 1484.1912, 1477.1605, 1745.7522, 1372.5475, 2168.5156, 1091.5667, 1271.8334, 1239.3267]
2025-08-07 11:19:27,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:19:27,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 59 minutes, 45 seconds)
2025-08-07 11:21:00,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:21:11,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1566.80664 ± 545.415
2025-08-07 11:21:11,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [882.3699, 1233.7166, 1300.3444, 1947.5914, 1397.1488, 1686.8545, 1049.4819, 2051.3162, 2817.2715, 1301.9707]
2025-08-07 11:21:11,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:21:11,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 45 seconds)
2025-08-07 11:22:44,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:55,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1597.82776 ± 491.024
2025-08-07 11:22:55,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2060.7998, 1037.6249, 1593.9712, 1043.1171, 1669.3794, 2480.0955, 2110.5215, 1014.2881, 1769.2506, 1199.2299]
2025-08-07 11:22:55,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:22:55,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 51 seconds)
2025-08-07 11:24:27,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:38,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1518.37146 ± 428.227
2025-08-07 11:24:38,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1403.2234, 1231.3557, 2622.2307, 1598.4149, 1534.2155, 1505.9282, 1742.8148, 1436.7241, 942.03094, 1166.7761]
2025-08-07 11:24:38,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:24:38,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 49 seconds)
2025-08-07 11:26:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1539.54150 ± 394.750
2025-08-07 11:26:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1017.9901, 1185.5833, 1999.3127, 1508.1337, 1410.763, 1866.1873, 2328.4915, 1332.0675, 1602.1862, 1144.699]
2025-08-07 11:26:21,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:26:21,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 49 seconds)
2025-08-07 11:27:53,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:04,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1990.46228 ± 531.062
2025-08-07 11:28:04,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2402.1858, 1644.9923, 2037.8041, 1276.3429, 2568.6113, 2969.4878, 1858.6265, 1213.4707, 2194.814, 1738.2875]
2025-08-07 11:28:04,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:28:04,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1990.46) for latency MM1Queue_a033_s075
2025-08-07 11:28:04,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 59 seconds)
2025-08-07 11:29:36,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:47,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1469.17065 ± 377.532
2025-08-07 11:29:47,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2047.2714, 1307.6757, 1340.5444, 2209.4473, 1258.2267, 1524.7106, 1245.1676, 1017.30554, 1071.4075, 1669.9504]
2025-08-07 11:29:47,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:29:47,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 13 seconds)
2025-08-07 11:31:24,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:35,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1538.27283 ± 427.977
2025-08-07 11:31:35,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1705.0005, 1565.6285, 1780.0063, 1083.0536, 1107.2858, 1155.9528, 2504.8057, 1536.1971, 1110.0138, 1834.7853]
2025-08-07 11:31:35,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:31:35,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 50 seconds)
2025-08-07 11:33:10,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:21,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1395.54163 ± 286.677
2025-08-07 11:33:21,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1848.623, 1746.806, 1609.8345, 929.5743, 1148.5437, 1534.7303, 1361.399, 1168.6555, 1503.8768, 1103.3735]
2025-08-07 11:33:21,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:33:21,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 21 seconds)
2025-08-07 11:34:56,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:07,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1488.21106 ± 337.890
2025-08-07 11:35:07,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1419.8052, 1000.4948, 1350.7261, 2123.5977, 1714.4305, 1453.7467, 1933.6088, 1283.8325, 1539.5491, 1062.3193]
2025-08-07 11:35:07,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:35:07,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 50 seconds)
2025-08-07 11:36:42,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:53,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1671.69177 ± 520.868
2025-08-07 11:36:53,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1127.9525, 1991.4359, 966.80096, 1173.4219, 2162.4065, 1083.3707, 2026.4994, 2206.9238, 2418.7515, 1559.3545]
2025-08-07 11:36:53,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:36:53,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 18 seconds)
2025-08-07 11:38:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:39,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1458.54211 ± 348.947
2025-08-07 11:38:39,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1124.4164, 1101.146, 1525.3525, 1892.9711, 2101.955, 1651.8885, 1194.8113, 1681.8519, 1279.2883, 1031.7405]
2025-08-07 11:38:39,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:38:39,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 44 seconds)
2025-08-07 11:40:14,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:25,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1792.50037 ± 652.450
2025-08-07 11:40:25,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1534.9205, 1685.5396, 2984.981, 2398.2017, 1705.1897, 918.97034, 2719.1155, 1527.4763, 1071.5499, 1379.0579]
2025-08-07 11:40:25,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:40:25,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 52 seconds)
2025-08-07 11:42:00,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:12,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1612.37830 ± 344.600
2025-08-07 11:42:12,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2236.3882, 2078.4167, 1460.7024, 1258.4509, 1385.4921, 1759.1349, 1794.844, 1664.191, 1083.5726, 1402.5907]
2025-08-07 11:42:12,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:42:12,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 7 seconds)
2025-08-07 11:43:47,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:58,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1719.17737 ± 615.543
2025-08-07 11:43:58,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1039.1193, 2360.4912, 1104.2408, 1998.551, 2502.9973, 1879.1099, 1014.1368, 1126.4768, 2677.2283, 1489.422]
2025-08-07 11:43:58,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:43:58,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 23 seconds)
2025-08-07 11:45:33,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1721.76685 ± 608.960
2025-08-07 11:45:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2429.457, 1867.5404, 1101.1202, 1630.7566, 960.27826, 1010.52637, 1995.4274, 2827.0334, 1245.3422, 2150.1865]
2025-08-07 11:45:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:45:44,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 38 seconds)
2025-08-07 11:47:19,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:30,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1418.38013 ± 269.755
2025-08-07 11:47:30,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1658.5353, 1797.9075, 1523.3842, 1058.1785, 1143.4879, 1437.7557, 1110.3999, 1420.901, 1204.3074, 1828.9445]
2025-08-07 11:47:30,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:47:30,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 52 seconds)
2025-08-07 11:49:06,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:17,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1730.02466 ± 645.537
2025-08-07 11:49:17,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2919.3909, 1401.4469, 1019.9074, 1816.0542, 1537.7969, 2984.0054, 1287.44, 1446.6165, 1239.4958, 1648.092]
2025-08-07 11:49:17,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:49:17,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 6 seconds)
2025-08-07 11:50:52,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:03,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1507.17700 ± 452.423
2025-08-07 11:51:03,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1299.3154, 1827.8232, 1189.8556, 1267.9174, 1692.034, 1085.951, 2640.8777, 1332.8998, 1664.75, 1070.3468]
2025-08-07 11:51:03,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:51:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 20 seconds)
2025-08-07 11:52:38,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:49,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1600.32471 ± 541.651
2025-08-07 11:52:49,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2114.5303, 1554.0388, 1429.8282, 1046.6632, 2463.0032, 1697.6664, 1068.1929, 1085.8286, 1074.8384, 2468.657]
2025-08-07 11:52:49,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:52:49,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 33 seconds)
2025-08-07 11:54:24,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:35,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1835.98206 ± 543.127
2025-08-07 11:54:35,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1315.4098, 1177.7985, 1519.1947, 2516.8875, 1588.0115, 1660.4988, 1743.7689, 2228.4636, 3007.4216, 1602.3644]
2025-08-07 11:54:35,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:54:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 45 seconds)
2025-08-07 11:56:10,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:21,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1646.35938 ± 474.228
2025-08-07 11:56:21,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1523.3787, 1247.0048, 1408.263, 1165.2435, 1756.3386, 2778.878, 1973.0006, 1305.2749, 2019.3933, 1286.8174]
2025-08-07 11:56:21,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:56:21,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 59 seconds)
2025-08-07 11:57:56,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:08,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1652.23071 ± 581.266
2025-08-07 11:58:08,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1631.9318, 1171.5568, 1129.3481, 3000.9673, 2347.634, 1477.7109, 1202.0283, 1895.649, 1525.8915, 1139.5887]
2025-08-07 11:58:08,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:58:08,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 14 seconds)
2025-08-07 11:59:43,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2171.02710 ± 626.527
2025-08-07 11:59:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2898.6362, 2922.5127, 2735.2014, 1437.3014, 2261.9663, 1897.6445, 1392.4264, 2466.7842, 2552.5808, 1145.2181]
2025-08-07 11:59:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:59:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (2171.03) for latency MM1Queue_a033_s075
2025-08-07 11:59:54,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 28 seconds)
2025-08-07 12:01:29,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:40,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1389.64551 ± 482.763
2025-08-07 12:01:40,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1264.4808, 1319.9569, 1041.43, 1032.5166, 1043.722, 945.9528, 2581.8896, 1181.4104, 1866.7834, 1618.3126]
2025-08-07 12:01:40,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:01:40,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 42 seconds)
2025-08-07 12:03:15,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:26,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1696.75488 ± 698.718
2025-08-07 12:03:26,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2464.3271, 2595.022, 1189.6486, 1862.009, 1471.9012, 2992.9958, 1126.0854, 1200.0762, 1123.7566, 941.7262]
2025-08-07 12:03:26,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:03:26,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 56 seconds)
2025-08-07 12:05:02,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:13,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1726.54529 ± 304.001
2025-08-07 12:05:13,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1418.2794, 1847.714, 1536.676, 2398.8418, 1650.9733, 1801.9185, 1406.74, 1729.6011, 2063.325, 1411.3835]
2025-08-07 12:05:13,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:05:13,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 10 seconds)
2025-08-07 12:06:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:59,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1499.33374 ± 373.626
2025-08-07 12:06:59,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1774.9489, 1283.7966, 1750.9625, 2212.3225, 1828.9077, 1270.3496, 1566.0138, 1055.0374, 953.0555, 1297.9429]
2025-08-07 12:06:59,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:06:59,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 23 seconds)
2025-08-07 12:08:34,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:45,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1648.59351 ± 669.590
2025-08-07 12:08:45,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2940.3997, 1571.6063, 1240.7026, 2712.678, 1021.2202, 1441.3754, 1177.2847, 2161.9622, 1165.8706, 1052.8365]
2025-08-07 12:08:45,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:08:45,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 37 seconds)
2025-08-07 12:10:20,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:32,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1389.37451 ± 471.946
2025-08-07 12:10:32,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1188.0332, 1174.4962, 2184.9739, 1100.7305, 1609.5099, 1188.5006, 2048.6868, 1219.8665, 501.16736, 1677.7816]
2025-08-07 12:10:32,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:10:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 51 seconds)
2025-08-07 12:12:06,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:18,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1702.63831 ± 616.541
2025-08-07 12:12:18,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2710.4194, 2378.5857, 1152.2924, 1463.742, 1392.8545, 2685.6814, 955.7462, 1697.1627, 1384.7113, 1205.1885]
2025-08-07 12:12:18,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:12:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 4 seconds)
2025-08-07 12:13:53,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:04,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1832.67712 ± 402.920
2025-08-07 12:14:04,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1787.3691, 1818.5215, 2107.8933, 1442.2031, 1084.7023, 2353.6846, 1709.879, 1868.2834, 1615.4834, 2538.7515]
2025-08-07 12:14:04,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:14:04,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 18 seconds)
2025-08-07 12:15:39,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:50,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1910.32849 ± 573.881
2025-08-07 12:15:50,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1553.1028, 953.84595, 1233.2123, 1551.3295, 2760.475, 2247.488, 2800.482, 1939.1993, 2078.3162, 1985.8333]
2025-08-07 12:15:50,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:15:50,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 32 seconds)
2025-08-07 12:17:25,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:36,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1591.11243 ± 471.762
2025-08-07 12:17:36,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1251.2695, 1258.8517, 950.72644, 1412.3252, 1481.6569, 1917.7054, 1974.4891, 2534.6614, 2012.1274, 1117.3119]
2025-08-07 12:17:36,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:17:36,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 46 seconds)
2025-08-07 12:19:11,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:22,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1835.36426 ± 548.175
2025-08-07 12:19:22,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1220.4989, 1302.6752, 1164.7089, 1351.2896, 2496.9548, 1944.4833, 2219.4846, 2043.1913, 2842.4092, 1767.9473]
2025-08-07 12:19:22,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:19:22,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1251 [DEBUG]: Training session finished
