2025-08-07 09:26:47,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:26:47,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:26:47,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x149dacd437d0>}
2025-08-07 09:26:47,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 09:26:47,544 baseline-bpql-noiseperc20-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 09:26:47,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 09:26:47,560 baseline-bpql-noiseperc20-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 09:26:47,561 baseline-bpql-noiseperc20-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 09:26:48,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 09:26:48,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 09:28:24,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:28:36,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -302.16562 ± 34.058
2025-08-07 09:28:36,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-298.7823, -302.1044, -273.22076, -299.95218, -271.81186, -335.82907, -338.77725, -250.511, -282.8564, -367.81094]
2025-08-07 09:28:36,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:28:36,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-302.17) for latency MM1Queue_a033_s075
2025-08-07 09:28:36,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2025-08-07 09:30:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:30:29,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -223.56998 ± 85.323
2025-08-07 09:30:29,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-286.33496, -215.61328, -287.06064, 7.669157, -215.12106, -221.56131, -286.9054, -255.56122, -183.80026, -291.41077]
2025-08-07 09:30:29,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:30:29,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-223.57) for latency MM1Queue_a033_s075
2025-08-07 09:30:29,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 13 seconds)
2025-08-07 09:32:10,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:32:22,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -174.78323 ± 94.017
2025-08-07 09:32:22,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [61.501133, -157.95381, -187.21133, -182.36275, -204.53748, -208.14656, -313.86707, -161.00436, -266.17627, -128.07388]
2025-08-07 09:32:22,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:32:22,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-174.78) for latency MM1Queue_a033_s075
2025-08-07 09:32:22,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 59 minutes, 45 seconds)
2025-08-07 09:34:03,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:14,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -159.64360 ± 60.478
2025-08-07 09:34:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-220.88809, -117.17197, -216.82167, -255.52542, -110.1064, -84.837326, -230.77469, -107.913086, -109.82378, -142.57372]
2025-08-07 09:34:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:34:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-159.64) for latency MM1Queue_a033_s075
2025-08-07 09:34:14,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 58 minutes, 33 seconds)
2025-08-07 09:35:56,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:07,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -109.25873 ± 70.259
2025-08-07 09:36:07,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-19.757885, -12.549566, -121.092415, -166.36543, -185.04095, -48.394688, -151.94737, -31.854582, -146.60966, -208.97462]
2025-08-07 09:36:07,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:36:07,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-109.26) for latency MM1Queue_a033_s075
2025-08-07 09:36:07,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 57 minutes, 5 seconds)
2025-08-07 09:37:49,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:00,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -48.21655 ± 80.738
2025-08-07 09:38:00,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-6.7261577, -60.57135, -99.72684, 129.53853, -124.25393, -106.911835, -93.18131, -65.99528, -118.305565, 63.968273]
2025-08-07 09:38:00,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:38:00,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-48.22) for latency MM1Queue_a033_s075
2025-08-07 09:38:00,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 56 minutes, 47 seconds)
2025-08-07 09:39:41,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:39:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 81.51026 ± 112.390
2025-08-07 09:39:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [200.17508, 13.9174, 281.0796, 251.4327, 12.089418, 10.635464, 43.355145, -10.849914, -56.304684, 69.572464]
2025-08-07 09:39:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:39:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (81.51) for latency MM1Queue_a033_s075
2025-08-07 09:39:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 54 minutes, 56 seconds)
2025-08-07 09:41:34,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:41:46,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 233.98288 ± 116.733
2025-08-07 09:41:46,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [305.78064, 87.82401, 356.2287, 258.5235, 333.2208, 378.02036, 261.6344, 62.081394, 244.68031, 51.834763]
2025-08-07 09:41:46,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:41:46,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (233.98) for latency MM1Queue_a033_s075
2025-08-07 09:41:46,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 53 minutes, 2 seconds)
2025-08-07 09:43:27,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:39,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 297.29178 ± 271.047
2025-08-07 09:43:39,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [430.28375, 431.74655, 364.59885, 406.15488, 379.43567, -277.66586, -196.60704, 483.0591, 437.82257, 514.0894]
2025-08-07 09:43:39,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:43:39,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (297.29) for latency MM1Queue_a033_s075
2025-08-07 09:43:39,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 51 minutes, 12 seconds)
2025-08-07 09:45:20,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:32,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 439.98383 ± 49.721
2025-08-07 09:45:32,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [429.59232, 494.3764, 444.82562, 433.96375, 336.40982, 454.4055, 450.43442, 475.82477, 371.93533, 508.07065]
2025-08-07 09:45:32,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:45:32,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (439.98) for latency MM1Queue_a033_s075
2025-08-07 09:45:32,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 49 minutes, 19 seconds)
2025-08-07 09:47:13,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:47:24,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 620.98071 ± 77.352
2025-08-07 09:47:24,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [549.5968, 583.77234, 578.1802, 618.6233, 642.1324, 756.09143, 667.1976, 500.8969, 739.45795, 573.8578]
2025-08-07 09:47:24,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:47:24,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (620.98) for latency MM1Queue_a033_s075
2025-08-07 09:47:24,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 47 minutes, 25 seconds)
2025-08-07 09:49:06,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:49:17,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 644.46698 ± 118.258
2025-08-07 09:49:17,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [814.98914, 754.577, 489.71613, 606.6524, 602.5148, 691.4301, 740.61707, 403.69147, 650.75946, 689.7227]
2025-08-07 09:49:17,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:49:17,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (644.47) for latency MM1Queue_a033_s075
2025-08-07 09:49:17,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 45 minutes, 31 seconds)
2025-08-07 09:50:59,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:51:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 684.78796 ± 75.778
2025-08-07 09:51:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [727.16943, 815.97205, 711.12744, 752.15497, 701.00775, 726.4707, 626.39575, 580.70685, 560.3885, 646.4861]
2025-08-07 09:51:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:51:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (684.79) for latency MM1Queue_a033_s075
2025-08-07 09:51:10,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 43 minutes, 39 seconds)
2025-08-07 09:52:51,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:53:03,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 701.30432 ± 86.925
2025-08-07 09:53:03,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [649.44336, 745.6712, 814.06476, 642.0348, 738.37354, 642.4269, 741.6193, 512.2728, 812.152, 714.98505]
2025-08-07 09:53:03,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:53:03,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (701.30) for latency MM1Queue_a033_s075
2025-08-07 09:53:03,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 41 minutes, 44 seconds)
2025-08-07 09:54:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:56,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 687.58630 ± 79.711
2025-08-07 09:54:56,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [629.8231, 606.0141, 776.5717, 622.65076, 808.8677, 571.8842, 744.45276, 710.7233, 634.95807, 769.9178]
2025-08-07 09:54:56,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:54:56,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 39 minutes, 52 seconds)
2025-08-07 09:56:37,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:56:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 895.10828 ± 166.555
2025-08-07 09:56:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [949.46765, 1159.5433, 703.98804, 789.7816, 1126.0475, 734.0119, 648.00946, 969.5004, 997.7642, 872.9691]
2025-08-07 09:56:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:56:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (895.11) for latency MM1Queue_a033_s075
2025-08-07 09:56:49,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 37 minutes, 59 seconds)
2025-08-07 09:58:30,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:42,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 731.80219 ± 89.507
2025-08-07 09:58:42,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [822.34546, 838.1632, 804.6678, 567.60974, 785.4422, 744.7479, 627.4994, 799.1434, 644.9922, 683.4105]
2025-08-07 09:58:42,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:58:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 36 minutes, 8 seconds)
2025-08-07 10:00:23,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:34,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 744.41272 ± 116.077
2025-08-07 10:00:34,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [786.162, 511.91116, 900.7053, 845.1489, 595.85004, 758.476, 849.2974, 649.10474, 761.6687, 785.8025]
2025-08-07 10:00:34,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:00:34,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 34 minutes, 12 seconds)
2025-08-07 10:02:16,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:27,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 764.08435 ± 73.909
2025-08-07 10:02:27,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [762.22626, 714.8256, 637.8075, 767.21564, 694.2532, 903.02954, 764.1734, 826.31824, 725.2163, 845.7781]
2025-08-07 10:02:27,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:02:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 32 minutes, 18 seconds)
2025-08-07 10:04:08,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:04:20,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 789.23828 ± 142.691
2025-08-07 10:04:20,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [699.68866, 521.075, 937.9581, 762.6031, 846.0818, 858.4129, 783.2877, 725.57806, 688.21277, 1069.4846]
2025-08-07 10:04:20,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:04:20,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 30 minutes, 25 seconds)
2025-08-07 10:06:01,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:13,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 787.24121 ± 120.963
2025-08-07 10:06:13,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [653.092, 631.1467, 689.7722, 887.8671, 859.66833, 855.05963, 1033.6411, 836.38776, 725.21765, 700.55945]
2025-08-07 10:06:13,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:06:13,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 28 minutes, 32 seconds)
2025-08-07 10:07:54,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:06,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 768.52216 ± 260.345
2025-08-07 10:08:06,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [786.3915, 860.6428, 917.0352, 705.5368, 933.4959, 25.757666, 797.1994, 881.42145, 785.15717, 992.5834]
2025-08-07 10:08:06,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:08:06,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 26 minutes, 39 seconds)
2025-08-07 10:09:47,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 848.48749 ± 110.920
2025-08-07 10:09:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [851.1451, 845.27826, 978.4256, 797.2084, 720.3565, 991.56494, 752.95355, 1019.7253, 680.66675, 847.55115]
2025-08-07 10:09:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:09:58,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 24 minutes, 46 seconds)
2025-08-07 10:11:39,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:11:51,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 720.29285 ± 378.138
2025-08-07 10:11:51,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [886.22095, 778.4163, 940.3798, 694.9382, 918.08203, 796.9098, 911.2428, -394.1086, 831.1676, 839.6798]
2025-08-07 10:11:51,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:11:51,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 22 minutes, 49 seconds)
2025-08-07 10:13:32,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:13:43,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 887.64667 ± 148.091
2025-08-07 10:13:43,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [705.10974, 921.374, 821.46576, 952.4249, 1207.2445, 782.60535, 797.04236, 1058.0284, 903.55164, 727.6203]
2025-08-07 10:13:43,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:13:43,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 20 minutes, 50 seconds)
2025-08-07 10:15:24,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:35,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 813.55872 ± 41.088
2025-08-07 10:15:35,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [808.0984, 865.0201, 851.8016, 808.3239, 809.66547, 733.7146, 837.3274, 746.0645, 841.62933, 833.94226]
2025-08-07 10:15:35,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:15:35,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 18 minutes, 45 seconds)
2025-08-07 10:17:15,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:27,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 899.41779 ± 103.028
2025-08-07 10:17:27,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [869.2962, 1136.9808, 912.71155, 820.6314, 801.87213, 847.21655, 848.63544, 925.5663, 1030.5052, 800.7628]
2025-08-07 10:17:27,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:17:27,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (899.42) for latency MM1Queue_a033_s075
2025-08-07 10:17:27,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 16 minutes, 28 seconds)
2025-08-07 10:19:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1028.22327 ± 206.247
2025-08-07 10:19:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1291.7588, 894.38684, 795.74554, 1058.7849, 1444.4371, 952.1309, 842.89087, 1196.5815, 832.62726, 972.8884]
2025-08-07 10:19:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:19:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1028.22) for latency MM1Queue_a033_s075
2025-08-07 10:19:18,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 14 minutes, 14 seconds)
2025-08-07 10:20:57,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:08,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 909.92041 ± 117.214
2025-08-07 10:21:08,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [914.77094, 1092.6864, 890.70557, 940.85706, 788.972, 946.56335, 747.8311, 1039.6436, 1010.2719, 726.9024]
2025-08-07 10:21:08,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:21:08,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 11 minutes, 54 seconds)
2025-08-07 10:22:47,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:59,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 901.36877 ± 130.841
2025-08-07 10:22:59,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [884.7504, 779.1947, 1252.8478, 962.57794, 789.3487, 811.0809, 935.9211, 822.4196, 877.54376, 898.00226]
2025-08-07 10:22:59,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:22:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 9 minutes, 36 seconds)
2025-08-07 10:24:38,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:49,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 950.20331 ± 219.508
2025-08-07 10:24:49,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1001.36237, 890.7855, 746.6146, 1449.895, 766.6316, 1146.9896, 742.52936, 713.84094, 983.5585, 1059.8251]
2025-08-07 10:24:49,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:24:49,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 7 minutes, 26 seconds)
2025-08-07 10:26:29,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:40,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 997.67761 ± 155.288
2025-08-07 10:26:40,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [999.8511, 915.82465, 1289.3914, 946.7866, 1253.8815, 858.0663, 898.3081, 1103.8209, 857.46625, 853.3806]
2025-08-07 10:26:40,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:26:40,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 5 minutes, 26 seconds)
2025-08-07 10:28:19,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:31,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1042.05945 ± 152.874
2025-08-07 10:28:31,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1079.1229, 956.3244, 942.3317, 1332.9171, 1053.0608, 1280.6185, 894.5594, 932.1067, 851.27545, 1098.278]
2025-08-07 10:28:31,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:28:31,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1042.06) for latency MM1Queue_a033_s075
2025-08-07 10:28:31,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 3 minutes, 27 seconds)
2025-08-07 10:30:10,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:21,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1104.99536 ± 255.180
2025-08-07 10:30:21,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1444.7968, 812.1811, 1027.891, 1010.79, 1181.1136, 1099.2715, 1098.3081, 804.1716, 1655.4824, 915.947]
2025-08-07 10:30:21,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:30:21,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1105.00) for latency MM1Queue_a033_s075
2025-08-07 10:30:21,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 1 minute, 37 seconds)
2025-08-07 10:32:00,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1059.70520 ± 227.511
2025-08-07 10:32:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [738.4959, 983.3222, 925.6568, 860.64856, 1373.9923, 998.82996, 853.6078, 1114.501, 1426.3829, 1321.6145]
2025-08-07 10:32:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:32:12,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 59 minutes, 46 seconds)
2025-08-07 10:33:51,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:02,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1103.51489 ± 238.005
2025-08-07 10:34:02,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1037.366, 1686.5251, 893.4272, 906.5969, 1031.5526, 1154.7083, 969.8045, 1213.0671, 848.0188, 1294.0834]
2025-08-07 10:34:02,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:34:02,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 57 minutes, 52 seconds)
2025-08-07 10:35:40,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:52,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1293.29675 ± 381.102
2025-08-07 10:35:52,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1544.5302, 1442.3136, 1522.6683, 883.69275, 1022.0542, 1920.5587, 1808.1144, 1026.5217, 837.17163, 925.3418]
2025-08-07 10:35:52,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:35:52,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1293.30) for latency MM1Queue_a033_s075
2025-08-07 10:35:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 55 minutes, 53 seconds)
2025-08-07 10:37:30,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:42,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1145.40295 ± 215.245
2025-08-07 10:37:42,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1105.4246, 816.24634, 1089.6873, 1053.9321, 1099.4727, 1529.0087, 1158.1042, 852.5594, 1347.7112, 1401.8829]
2025-08-07 10:37:42,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:37:42,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 53 minutes, 55 seconds)
2025-08-07 10:39:20,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:32,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1041.63940 ± 220.778
2025-08-07 10:39:32,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [823.2118, 1247.089, 874.3913, 755.12866, 1440.607, 1223.8307, 821.04504, 1202.7268, 916.816, 1111.5481]
2025-08-07 10:39:32,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:39:32,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 51 minutes, 57 seconds)
2025-08-07 10:41:10,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:22,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 990.79700 ± 219.575
2025-08-07 10:41:22,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1061.0356, 1066.5675, 913.61847, 956.3489, 911.3782, 1001.4609, 979.2307, 863.2924, 1539.2881, 615.74896]
2025-08-07 10:41:22,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:41:22,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 49 minutes, 58 seconds)
2025-08-07 10:43:00,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1129.38733 ± 305.631
2025-08-07 10:43:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [850.771, 1655.96, 934.10693, 904.5616, 1261.3043, 1406.5077, 1572.6218, 786.08923, 1073.409, 848.542]
2025-08-07 10:43:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:43:11,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 48 minutes, 3 seconds)
2025-08-07 10:44:50,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1268.07495 ± 529.227
2025-08-07 10:45:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2448.711, 934.0577, 2127.1821, 1080.8549, 1031.8331, 1146.8656, 1207.5596, 815.13184, 815.4097, 1073.1443]
2025-08-07 10:45:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:45:01,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 46 minutes, 13 seconds)
2025-08-07 10:46:40,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:51,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1086.54663 ± 198.879
2025-08-07 10:46:51,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1168.7902, 862.8937, 1355.7692, 1471.3495, 933.17706, 1133.7274, 984.1767, 1095.223, 1058.6036, 801.7552]
2025-08-07 10:46:51,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:46:51,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 44 minutes, 23 seconds)
2025-08-07 10:48:30,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:41,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1168.40894 ± 333.691
2025-08-07 10:48:41,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [931.2635, 811.3801, 1202.6971, 802.2028, 845.04694, 1086.147, 1365.4657, 1199.7539, 1588.328, 1851.8036]
2025-08-07 10:48:41,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:48:41,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 42 minutes, 33 seconds)
2025-08-07 10:50:20,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:31,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1258.74414 ± 364.331
2025-08-07 10:50:31,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1116.5951, 1279.1067, 2207.0044, 1143.771, 1355.2971, 1118.3457, 919.28296, 966.77655, 954.13354, 1527.1277]
2025-08-07 10:50:31,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:50:31,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 40 minutes, 43 seconds)
2025-08-07 10:52:10,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:21,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1443.37793 ± 514.974
2025-08-07 10:52:21,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1325.08, 762.90137, 1558.2462, 1781.275, 1464.4221, 2685.5027, 798.1768, 1557.7747, 1239.0481, 1261.3528]
2025-08-07 10:52:21,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:52:21,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1443.38) for latency MM1Queue_a033_s075
2025-08-07 10:52:21,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 38 minutes, 55 seconds)
2025-08-07 10:54:00,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:11,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1315.95984 ± 387.518
2025-08-07 10:54:11,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2025.3357, 1562.3002, 1052.1191, 835.3963, 1549.9515, 1002.2039, 748.3419, 1225.4758, 1665.9849, 1492.4894]
2025-08-07 10:54:11,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:54:11,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 37 minutes, 7 seconds)
2025-08-07 10:55:50,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:01,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1245.06580 ± 261.418
2025-08-07 10:56:01,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1144.9696, 1648.7532, 1725.1577, 998.87244, 1140.2318, 1163.3645, 891.84174, 1339.4695, 1020.5018, 1377.4965]
2025-08-07 10:56:01,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:56:01,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 35 minutes, 19 seconds)
2025-08-07 10:57:40,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1086.93872 ± 178.445
2025-08-07 10:57:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1155.2897, 886.0639, 879.30786, 1307.6816, 1231.0159, 1030.1423, 1239.3633, 1185.029, 1202.3348, 753.15826]
2025-08-07 10:57:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:57:51,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 33 minutes, 29 seconds)
2025-08-07 10:59:30,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:41,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 941.84589 ± 227.396
2025-08-07 10:59:41,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [956.36847, 1559.3204, 885.2074, 768.40845, 889.7984, 786.7071, 771.93726, 1102.6544, 806.7166, 891.34076]
2025-08-07 10:59:41,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:59:41,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 31 minutes, 39 seconds)
2025-08-07 11:01:20,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:31,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1214.92712 ± 468.877
2025-08-07 11:01:31,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1018.7587, 807.0267, 835.07666, 878.1704, 1772.6925, 815.6187, 1682.9866, 2161.3047, 1332.5753, 845.0619]
2025-08-07 11:01:31,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:01:31,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 29 minutes, 47 seconds)
2025-08-07 11:03:09,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:21,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 953.77429 ± 135.754
2025-08-07 11:03:21,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [769.66174, 1032.9996, 985.69525, 814.0201, 840.28357, 1024.2218, 1094.0135, 876.96106, 874.3596, 1225.5273]
2025-08-07 11:03:21,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:03:21,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 27 minutes, 56 seconds)
2025-08-07 11:04:59,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:11,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1311.42822 ± 618.196
2025-08-07 11:05:11,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [744.3336, 848.4266, 1415.2325, 1283.9219, 355.95053, 2677.3523, 1960.073, 1228.2769, 1484.1821, 1116.534]
2025-08-07 11:05:11,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:05:11,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 26 minutes, 5 seconds)
2025-08-07 11:06:49,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:01,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1326.22925 ± 266.489
2025-08-07 11:07:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1279.047, 1585.7562, 1172.3191, 979.8528, 946.38196, 1526.2402, 1111.9132, 1770.0712, 1301.6183, 1589.093]
2025-08-07 11:07:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:07:01,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 24 minutes, 14 seconds)
2025-08-07 11:08:39,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:51,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1211.82983 ± 457.408
2025-08-07 11:08:51,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2538.5464, 1204.8749, 930.0714, 875.14105, 995.1053, 1126.3566, 1155.0171, 1191.9174, 929.42535, 1171.843]
2025-08-07 11:08:51,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:08:51,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 22 minutes, 26 seconds)
2025-08-07 11:10:29,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:41,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1111.27319 ± 328.815
2025-08-07 11:10:41,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [901.34314, 849.21423, 1002.7092, 1235.399, 1990.192, 901.48883, 1056.9805, 1291.4972, 1068.4047, 815.5041]
2025-08-07 11:10:41,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:10:41,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 20 minutes, 37 seconds)
2025-08-07 11:12:19,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:30,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1168.81067 ± 381.074
2025-08-07 11:12:30,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [978.0057, 886.74896, 932.3235, 1983.3569, 997.25684, 1862.6003, 1068.9855, 966.22833, 1039.475, 973.126]
2025-08-07 11:12:30,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:12:30,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 18 minutes, 47 seconds)
2025-08-07 11:14:09,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:20,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1150.27148 ± 163.336
2025-08-07 11:14:20,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [956.29095, 862.66034, 1385.0558, 1072.2573, 1205.7826, 1102.2847, 1283.7107, 1391.2449, 1155.9048, 1087.5236]
2025-08-07 11:14:20,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:14:20,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 16 minutes, 58 seconds)
2025-08-07 11:15:59,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:10,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1353.19702 ± 391.287
2025-08-07 11:16:10,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1175.9705, 1431.315, 1307.0398, 2420.6138, 953.7956, 1299.388, 1293.9075, 1355.694, 1372.027, 922.219]
2025-08-07 11:16:10,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:16:10,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 15 minutes, 7 seconds)
2025-08-07 11:17:49,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:00,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1484.69604 ± 514.805
2025-08-07 11:18:00,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [797.1435, 1609.8962, 1312.5919, 1756.0536, 1691.9232, 2489.6907, 2066.503, 879.93585, 984.254, 1258.9678]
2025-08-07 11:18:00,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:18:00,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1484.70) for latency MM1Queue_a033_s075
2025-08-07 11:18:00,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 13 minutes, 16 seconds)
2025-08-07 11:19:39,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:50,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1436.23926 ± 341.857
2025-08-07 11:19:50,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1639.6014, 1259.9392, 1166.3495, 1137.0133, 1475.1306, 955.2398, 2115.9487, 1631.9733, 1802.6458, 1178.5518]
2025-08-07 11:19:50,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:19:50,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 11 minutes, 26 seconds)
2025-08-07 11:21:29,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:21:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1226.29761 ± 409.197
2025-08-07 11:21:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [782.8857, 1147.5947, 1451.1, 1364.43, 834.73914, 791.2993, 1256.5953, 1878.8412, 836.0559, 1919.4343]
2025-08-07 11:21:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:21:40,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 9 minutes, 36 seconds)
2025-08-07 11:23:19,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:23:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1369.08899 ± 402.025
2025-08-07 11:23:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1942.3033, 2207.598, 1220.3899, 1332.4026, 1392.285, 1006.6854, 1474.6971, 1046.4515, 822.5278, 1245.5498]
2025-08-07 11:23:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:23:30,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 7 minutes, 45 seconds)
2025-08-07 11:25:08,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:20,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1298.96338 ± 444.801
2025-08-07 11:25:20,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1091.3683, 926.13257, 863.7245, 1809.876, 2045.1201, 907.412, 1969.8313, 1270.0785, 874.15076, 1231.9402]
2025-08-07 11:25:20,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:25:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 5 minutes, 55 seconds)
2025-08-07 11:26:58,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:09,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1477.62085 ± 487.849
2025-08-07 11:27:09,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1130.2902, 1322.1687, 1543.3649, 1348.6373, 1506.8474, 966.1967, 2592.3398, 1584.0205, 1975.5198, 806.82294]
2025-08-07 11:27:09,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:27:10,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 4 minutes, 5 seconds)
2025-08-07 11:28:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:59,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1250.18140 ± 264.942
2025-08-07 11:28:59,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1261.7866, 1043.8584, 967.11633, 1271.4414, 910.2152, 1406.2828, 1374.8313, 1819.9115, 1443.561, 1002.8101]
2025-08-07 11:28:59,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:28:59,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 2 minutes, 15 seconds)
2025-08-07 11:30:38,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:49,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1356.90491 ± 488.056
2025-08-07 11:30:49,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [883.5258, 890.8226, 1107.1243, 1579.6145, 1002.3704, 2471.3435, 1294.7025, 1283.8624, 1084.3542, 1971.3287]
2025-08-07 11:30:49,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:30:49,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 24 seconds)
2025-08-07 11:32:28,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:39,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1094.94373 ± 414.056
2025-08-07 11:32:39,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [986.66205, 863.3513, 1088.823, 860.55994, 2282.0938, 870.08136, 903.0296, 1018.6273, 1244.3663, 831.84314]
2025-08-07 11:32:39,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:32:39,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 58 minutes, 34 seconds)
2025-08-07 11:34:18,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:29,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1237.73889 ± 375.094
2025-08-07 11:34:29,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2028.4858, 984.3355, 1009.7281, 1075.9033, 1047.683, 1099.7407, 1030.1809, 1101.0836, 1938.6846, 1061.5635]
2025-08-07 11:34:29,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:34:29,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 56 minutes, 44 seconds)
2025-08-07 11:36:08,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:19,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1429.20581 ± 452.223
2025-08-07 11:36:19,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1519.4885, 1654.1135, 2385.9988, 803.2714, 1092.7002, 1158.3627, 1669.3491, 1821.9873, 1259.2177, 927.5694]
2025-08-07 11:36:19,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:36:19,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 54 minutes, 56 seconds)
2025-08-07 11:37:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1238.18604 ± 394.874
2025-08-07 11:38:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1130.5795, 1928.5645, 834.4042, 1204.1494, 1089.2517, 903.7315, 958.2254, 1468.7836, 1964.0947, 900.0758]
2025-08-07 11:38:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:38:09,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 6 seconds)
2025-08-07 11:39:47,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:59,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1047.36902 ± 150.235
2025-08-07 11:39:59,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1332.5479, 1137.0587, 988.6011, 1025.9814, 855.5562, 1233.0851, 961.63477, 945.1632, 858.6916, 1135.3704]
2025-08-07 11:39:59,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:39:59,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 51 minutes, 17 seconds)
2025-08-07 11:41:37,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:49,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1145.91431 ± 375.530
2025-08-07 11:41:49,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [929.552, 953.0439, 931.3193, 920.4855, 1822.5114, 1304.2301, 931.0408, 844.415, 1895.3165, 927.22864]
2025-08-07 11:41:49,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:41:49,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 49 minutes, 27 seconds)
2025-08-07 11:43:27,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1749.06763 ± 456.469
2025-08-07 11:43:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1060.2468, 2072.791, 2504.2527, 2183.3882, 1240.8375, 1466.1418, 1580.6392, 2024.8745, 1296.1552, 2061.3486]
2025-08-07 11:43:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:43:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1749.07) for latency MM1Queue_a033_s075
2025-08-07 11:43:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 38 seconds)
2025-08-07 11:45:17,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:28,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1527.24316 ± 464.206
2025-08-07 11:45:28,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1647.1982, 1813.8707, 844.3561, 1347.7603, 771.5177, 1769.1705, 1576.9862, 2471.9092, 1382.6288, 1647.0332]
2025-08-07 11:45:28,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:45:28,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 47 seconds)
2025-08-07 11:47:07,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:18,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1234.85217 ± 465.936
2025-08-07 11:47:18,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1106.183, 920.803, 1581.9073, 1563.0844, 921.20715, 831.188, 2412.7058, 892.0151, 1054.5508, 1064.8772]
2025-08-07 11:47:18,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:47:18,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 43 minutes, 58 seconds)
2025-08-07 11:48:57,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:08,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1413.72925 ± 445.946
2025-08-07 11:49:08,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1277.3649, 1556.2283, 857.419, 1918.8778, 1377.6294, 1025.1351, 1919.3387, 1247.2045, 2164.943, 793.152]
2025-08-07 11:49:08,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:49:08,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 42 minutes, 8 seconds)
2025-08-07 11:50:47,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:58,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1458.49341 ± 422.562
2025-08-07 11:50:58,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1935.9902, 1345.8057, 1096.3435, 1087.7365, 1009.02277, 1515.2367, 1702.4716, 2053.337, 847.38354, 1991.6052]
2025-08-07 11:50:58,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:50:58,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 40 minutes, 18 seconds)
2025-08-07 11:52:37,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:48,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1242.24780 ± 264.607
2025-08-07 11:52:48,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1738.4767, 976.28174, 1056.4563, 1640.6652, 1315.7947, 909.7091, 1012.4936, 1291.4828, 1143.7963, 1337.322]
2025-08-07 11:52:48,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:52:48,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 28 seconds)
2025-08-07 11:54:27,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:38,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1724.85254 ± 555.660
2025-08-07 11:54:38,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1583.8187, 2365.369, 2426.7812, 974.9668, 834.4833, 2284.3367, 1157.8961, 2118.1094, 1849.1592, 1653.6058]
2025-08-07 11:54:38,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:54:38,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 38 seconds)
2025-08-07 11:56:16,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:28,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1549.36218 ± 437.789
2025-08-07 11:56:28,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [923.52344, 1521.7355, 2416.3923, 1522.8488, 1242.766, 1919.4512, 1449.3475, 2083.9897, 1135.4031, 1278.1643]
2025-08-07 11:56:28,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:56:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 48 seconds)
2025-08-07 11:58:06,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:18,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1603.42859 ± 583.419
2025-08-07 11:58:18,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [935.47363, 1574.553, 914.6749, 1085.0664, 2292.828, 2072.6619, 2259.6897, 2413.4336, 950.058, 1535.8477]
2025-08-07 11:58:18,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:58:18,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 58 seconds)
2025-08-07 11:59:57,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:08,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1579.95239 ± 508.673
2025-08-07 12:00:08,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1744.854, 1970.0063, 1491.9326, 979.55865, 1128.6827, 2678.2085, 2031.2618, 989.54535, 1330.4417, 1455.0331]
2025-08-07 12:00:08,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:00:08,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 8 seconds)
2025-08-07 12:01:46,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1438.04907 ± 465.031
2025-08-07 12:01:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [878.9387, 1028.8405, 1564.2937, 2153.7717, 1135.5905, 1739.7345, 2202.5317, 1059.6278, 984.05817, 1633.102]
2025-08-07 12:01:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:01:58,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 19 seconds)
2025-08-07 12:03:36,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1411.67957 ± 494.728
2025-08-07 12:03:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2320.9592, 918.4275, 2133.6465, 1104.8262, 1091.5824, 879.988, 1715.4552, 1073.6537, 1148.5544, 1729.7025]
2025-08-07 12:03:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:03:48,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 29 seconds)
2025-08-07 12:05:26,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:38,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1614.55176 ± 511.302
2025-08-07 12:05:38,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1892.2025, 2195.1675, 1250.4768, 2120.917, 1034.2472, 1481.156, 1235.1313, 1252.6876, 2576.079, 1107.452]
2025-08-07 12:05:38,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:05:38,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 39 seconds)
2025-08-07 12:07:16,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:28,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1268.59595 ± 344.308
2025-08-07 12:07:28,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1110.3954, 1424.8701, 1069.0276, 1350.1665, 893.3831, 928.6927, 1556.9868, 1631.6846, 812.0237, 1908.7291]
2025-08-07 12:07:28,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:07:28,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 50 seconds)
2025-08-07 12:09:06,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:18,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1522.60388 ± 396.962
2025-08-07 12:09:18,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1204.1686, 1026.0325, 934.1561, 1924.7699, 1733.4989, 1947.5795, 2156.9436, 1644.9374, 1306.7456, 1347.2069]
2025-08-07 12:09:18,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:09:18,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 59 seconds)
2025-08-07 12:10:56,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:08,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1483.27576 ± 442.246
2025-08-07 12:11:08,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [856.6558, 1965.1008, 1561.0652, 1050.0793, 1028.9382, 1942.6826, 2088.8088, 1568.5409, 981.2558, 1789.6301]
2025-08-07 12:11:08,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:11:08,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 9 seconds)
2025-08-07 12:12:46,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:58,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1528.86523 ± 516.947
2025-08-07 12:12:58,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2452.1367, 1099.3391, 1129.5039, 1063.1179, 1434.2733, 2469.8672, 1167.0184, 1613.6332, 1108.278, 1751.4847]
2025-08-07 12:12:58,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:12:58,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 20 seconds)
2025-08-07 12:14:37,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1422.03455 ± 400.439
2025-08-07 12:14:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1483.2611, 1730.3645, 1622.3513, 1064.579, 974.2946, 1258.6782, 2345.0334, 1263.9927, 951.09503, 1526.6958]
2025-08-07 12:14:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:14:48,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 30 seconds)
2025-08-07 12:16:27,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:38,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1565.49097 ± 433.882
2025-08-07 12:16:38,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1570.2406, 1881.241, 1033.3186, 2280.2537, 1478.6301, 2313.231, 1354.3307, 1143.9835, 1128.5742, 1471.1075]
2025-08-07 12:16:38,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:16:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 40 seconds)
2025-08-07 12:18:17,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:28,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1336.43994 ± 534.898
2025-08-07 12:18:28,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1639.5814, 898.59344, 990.4386, 1021.57526, 2440.1648, 909.66705, 1857.7725, 829.0192, 1849.0166, 928.5703]
2025-08-07 12:18:28,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:18:28,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 50 seconds)
2025-08-07 12:20:06,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:18,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1709.51367 ± 424.173
2025-08-07 12:20:18,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2303.3457, 1485.2458, 998.6503, 1389.6637, 2163.9014, 2113.8486, 1892.0345, 1159.7732, 1608.8832, 1979.7913]
2025-08-07 12:20:18,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:20:18,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 59 seconds)
2025-08-07 12:21:56,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1387.81738 ± 489.732
2025-08-07 12:22:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2163.391, 1250.4786, 2248.063, 1087.572, 1800.8934, 1246.3905, 1400.4281, 1013.41254, 795.5382, 872.0065]
2025-08-07 12:22:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:22:08,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 9 seconds)
2025-08-07 12:23:46,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:58,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1505.84399 ± 526.513
2025-08-07 12:23:58,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1856.8386, 2075.2546, 884.7094, 1075.7799, 844.1276, 998.3499, 1576.2421, 2016.039, 2382.5212, 1348.5758]
2025-08-07 12:23:58,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:23:58,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 19 seconds)
2025-08-07 12:25:36,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:48,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2026.23376 ± 523.820
2025-08-07 12:25:48,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2613.3079, 2560.892, 2703.6272, 1341.3961, 1362.8889, 1868.8956, 2414.7183, 1915.0245, 2184.9055, 1296.6824]
2025-08-07 12:25:48,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:25:48,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (2026.23) for latency MM1Queue_a033_s075
2025-08-07 12:25:48,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 29 seconds)
2025-08-07 12:27:26,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:27:38,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1334.80847 ± 456.116
2025-08-07 12:27:38,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2322.396, 775.7515, 997.1482, 1487.5459, 767.38104, 1552.4196, 1209.349, 1066.6509, 1776.18, 1393.2627]
2025-08-07 12:27:38,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:27:38,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 39 seconds)
2025-08-07 12:29:17,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:29:28,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1483.20312 ± 526.327
2025-08-07 12:29:28,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1447.8369, 2391.1667, 1104.389, 1936.519, 2085.6458, 1025.3665, 955.88385, 873.8212, 1937.3138, 1074.0894]
2025-08-07 12:29:28,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:29:28,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 50 seconds)
2025-08-07 12:31:06,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:31:18,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1701.97913 ± 373.156
2025-08-07 12:31:18,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1210.2562, 2247.7637, 1925.7952, 1734.7688, 1658.9474, 1590.4465, 1093.1653, 2331.8762, 1567.5812, 1659.1902]
2025-08-07 12:31:18,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:31:18,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1251 [DEBUG]: Training session finished
