2025-08-07 10:51:46,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:51:46,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:51:46,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14d1d532ff50>}
2025-08-07 10:51:46,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 10:51:46,509 baseline-bpql-noiseperc20-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:51:46,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 10:51:46,526 baseline-bpql-noiseperc20-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 10:51:46,526 baseline-bpql-noiseperc20-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:51:47,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 10:51:47,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 10:53:15,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 5.72724 ± 7.349
2025-08-07 10:53:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.6793027, -0.8382942, -4.4423327, 6.3756413, 0.33355948, 8.8769655, 2.8234704, 9.909016, 23.81706, 4.738064]
2025-08-07 10:53:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 61.0, 23.0, 72.0, 13.0, 28.0, 16.0, 70.0, 59.0, 56.0]
2025-08-07 10:53:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (5.73) for latency MM1Queue_a033_s075
2025-08-07 10:53:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 26 minutes, 36 seconds)
2025-08-07 10:54:53,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:54,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 52.39732 ± 75.868
2025-08-07 10:54:54,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [137.52419, 35.085835, 108.989815, 6.340683, 0.41385114, 1.3799015, 227.70416, 15.384576, 15.870921, -24.720758]
2025-08-07 10:54:54,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 45.0, 85.0, 38.0, 12.0, 17.0, 133.0, 28.0, 27.0, 138.0]
2025-08-07 10:54:54,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (52.40) for latency MM1Queue_a033_s075
2025-08-07 10:54:54,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 32 minutes, 39 seconds)
2025-08-07 10:56:31,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:31,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 42.37959 ± 57.378
2025-08-07 10:56:31,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [32.508545, 1.8445847, 165.88863, 16.163689, 2.2692308, 4.956122, 24.493448, 3.9222002, 143.52823, 28.221151]
2025-08-07 10:56:31,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [47.0, 18.0, 161.0, 68.0, 17.0, 17.0, 44.0, 16.0, 175.0, 44.0]
2025-08-07 10:56:31,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 33 minutes, 12 seconds)
2025-08-07 10:58:07,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:08,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 18.72332 ± 21.955
2025-08-07 10:58:08,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.47326756, 2.4601424, 38.35281, 14.464362, 9.104091, -5.4270735, 54.766537, 58.067924, 4.133277, 11.784416]
2025-08-07 10:58:08,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [10.0, 12.0, 60.0, 46.0, 21.0, 157.0, 76.0, 67.0, 18.0, 39.0]
2025-08-07 10:58:08,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 32 minutes, 21 seconds)
2025-08-07 10:59:45,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:45,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 28.60464 ± 43.129
2025-08-07 10:59:45,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [11.776673, 40.86821, 14.451223, -1.1663506, 31.030607, 16.119566, 13.732657, 3.3726969, 3.1142485, 152.74687]
2025-08-07 10:59:45,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 90.0, 24.0, 14.0, 51.0, 99.0, 25.0, 17.0, 15.0, 116.0]
2025-08-07 10:59:45,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 31 minutes, 27 seconds)
2025-08-07 11:01:21,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 89.52301 ± 98.666
2025-08-07 11:01:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.864539, 60.196953, 203.37633, 13.66827, 75.01926, 57.496212, 336.9378, 73.95281, 65.88186, 1.8360629]
2025-08-07 11:01:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 57.0, 137.0, 27.0, 68.0, 117.0, 223.0, 112.0, 67.0, 15.0]
2025-08-07 11:01:22,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (89.52) for latency MM1Queue_a033_s075
2025-08-07 11:01:22,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 32 minutes, 25 seconds)
2025-08-07 11:02:59,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:59,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 27.74362 ± 30.798
2025-08-07 11:02:59,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.4000918, 15.936358, 26.17287, 3.4904795, 109.430374, 55.434258, 14.570254, 9.966511, 24.217934, 16.817062]
2025-08-07 11:02:59,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 30.0, 144.0, 18.0, 80.0, 78.0, 27.0, 23.0, 47.0, 28.0]
2025-08-07 11:02:59,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 30 minutes, 25 seconds)
2025-08-07 11:04:36,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:36,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 71.12672 ± 64.314
2025-08-07 11:04:36,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [23.715662, 33.601112, 180.73125, 141.28851, 176.35161, 62.40476, 26.830858, 25.594294, 5.6206746, 35.1285]
2025-08-07 11:04:36,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 43.0, 95.0, 80.0, 114.0, 63.0, 34.0, 39.0, 18.0, 42.0]
2025-08-07 11:04:36,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 28 minutes, 45 seconds)
2025-08-07 11:06:13,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:14,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 59.66241 ± 70.386
2025-08-07 11:06:14,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [38.747704, 73.35833, 12.571796, 1.2929412, 206.83551, 50.729702, 5.1935635, 179.54723, 8.0658, 20.281578]
2025-08-07 11:06:14,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [53.0, 79.0, 28.0, 14.0, 102.0, 73.0, 16.0, 103.0, 23.0, 31.0]
2025-08-07 11:06:14,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 27 minutes, 19 seconds)
2025-08-07 11:07:51,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:52,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 37.89014 ± 46.147
2025-08-07 11:07:52,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [14.653007, 67.07331, 0.74709576, 4.1679807, 5.542197, 20.79775, 129.62407, 115.25979, 19.144299, 1.8919686]
2025-08-07 11:07:52,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 65.0, 16.0, 16.0, 16.0, 32.0, 88.0, 82.0, 31.0, 15.0]
2025-08-07 11:07:52,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 25 minutes, 55 seconds)
2025-08-07 11:09:29,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 76.79848 ± 50.087
2025-08-07 11:09:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [27.70304, 78.4937, 147.81479, 69.767426, 3.3182275, 0.49125698, 143.14188, 100.37507, 83.19787, 113.68161]
2025-08-07 11:09:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [35.0, 130.0, 98.0, 73.0, 14.0, 14.0, 132.0, 95.0, 74.0, 122.0]
2025-08-07 11:09:30,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 24 minutes, 38 seconds)
2025-08-07 11:11:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 59.74643 ± 72.544
2025-08-07 11:11:06,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [191.05748, 16.333267, 21.552406, 107.36579, -0.035815477, 13.567551, 13.210861, 37.264626, 194.43713, 2.7109952]
2025-08-07 11:11:06,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 27.0, 42.0, 109.0, 12.0, 28.0, 22.0, 45.0, 117.0, 14.0]
2025-08-07 11:11:06,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 22 minutes, 54 seconds)
2025-08-07 11:12:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:44,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 57.89003 ± 78.533
2025-08-07 11:12:44,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [254.58191, 31.450762, 155.48907, 2.5025427, 7.45976, 0.54595095, 31.434462, 19.650703, 59.109505, 16.675644]
2025-08-07 11:12:44,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 38.0, 119.0, 13.0, 18.0, 11.0, 43.0, 28.0, 57.0, 45.0]
2025-08-07 11:12:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 21 minutes, 30 seconds)
2025-08-07 11:14:21,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:21,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 42.00310 ± 33.929
2025-08-07 11:14:21,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [30.791178, 3.2098908, 45.933277, 65.694336, 12.161897, 12.515937, 21.820251, 27.360489, 92.15063, 108.3931]
2025-08-07 11:14:21,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [40.0, 15.0, 48.0, 87.0, 22.0, 30.0, 35.0, 34.0, 122.0, 120.0]
2025-08-07 11:14:21,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 19 minutes, 48 seconds)
2025-08-07 11:15:58,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:59,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 138.12334 ± 123.398
2025-08-07 11:15:59,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.445582, 124.85389, 45.808704, 116.91235, 24.439903, 51.109604, 387.2175, 161.76463, 342.24716, 123.434074]
2025-08-07 11:15:59,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 84.0, 116.0, 87.0, 38.0, 76.0, 298.0, 130.0, 220.0, 82.0]
2025-08-07 11:15:59,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (138.12) for latency MM1Queue_a033_s075
2025-08-07 11:15:59,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 18 minutes, 7 seconds)
2025-08-07 11:17:36,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:37,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 41.84539 ± 60.572
2025-08-07 11:17:37,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [7.199899, 190.67978, 7.35136, 42.894863, 20.440413, 2.232202, 22.316229, 122.347115, 0.72920674, 2.262856]
2025-08-07 11:17:37,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 140.0, 17.0, 50.0, 28.0, 14.0, 33.0, 165.0, 14.0, 17.0]
2025-08-07 11:17:37,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 16 minutes, 17 seconds)
2025-08-07 11:19:14,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:15,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 130.58064 ± 53.275
2025-08-07 11:19:15,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [175.73627, 66.817924, 132.95436, 157.92735, 31.594765, 121.825584, 79.60411, 155.4233, 213.1509, 170.7719]
2025-08-07 11:19:15,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 194.0, 74.0, 124.0, 37.0, 91.0, 64.0, 103.0, 170.0, 101.0]
2025-08-07 11:19:15,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 15 minutes, 10 seconds)
2025-08-07 11:20:52,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:53,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 120.66879 ± 104.796
2025-08-07 11:20:53,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [133.86908, 1.730757, 325.14758, 163.36554, 97.9949, 6.986192, 43.001904, 3.448145, 197.6426, 233.50117]
2025-08-07 11:20:53,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 12.0, 176.0, 99.0, 83.0, 18.0, 48.0, 16.0, 108.0, 126.0]
2025-08-07 11:20:53,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 13 minutes, 32 seconds)
2025-08-07 11:22:30,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 129.79558 ± 105.174
2025-08-07 11:22:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [45.0975, 107.50908, 5.6820283, 154.12802, 193.0319, 108.763985, 17.677332, 93.829796, 186.69194, 385.54413]
2025-08-07 11:22:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [55.0, 188.0, 18.0, 112.0, 96.0, 81.0, 59.0, 66.0, 104.0, 293.0]
2025-08-07 11:22:31,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 12 minutes, 11 seconds)
2025-08-07 11:24:08,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:09,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 112.72528 ± 76.405
2025-08-07 11:24:09,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [50.63865, -0.5669856, 222.0144, 210.8423, 40.346573, 140.4979, 142.38152, 194.27908, 92.66225, 34.15705]
2025-08-07 11:24:09,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [57.0, 10.0, 124.0, 129.0, 41.0, 109.0, 117.0, 162.0, 76.0, 40.0]
2025-08-07 11:24:09,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 10 minutes, 38 seconds)
2025-08-07 11:25:46,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:46,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 68.74430 ± 51.001
2025-08-07 11:25:46,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.5857375, 57.668518, 58.568386, 181.27504, 76.52811, 50.77245, -0.76122177, 124.89859, 51.564434, 83.342896]
2025-08-07 11:25:46,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 58.0, 52.0, 134.0, 65.0, 50.0, 14.0, 87.0, 54.0, 120.0]
2025-08-07 11:25:46,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 8 minutes, 57 seconds)
2025-08-07 11:27:24,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:25,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 139.25711 ± 82.393
2025-08-07 11:27:25,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [164.56342, 291.27774, 177.04765, 52.30326, 183.89462, 133.23746, 99.30008, 227.05841, 12.196279, 51.69231]
2025-08-07 11:27:25,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 221.0, 107.0, 122.0, 92.0, 90.0, 73.0, 147.0, 38.0, 65.0]
2025-08-07 11:27:25,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (139.26) for latency MM1Queue_a033_s075
2025-08-07 11:27:25,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 7 minutes, 23 seconds)
2025-08-07 11:29:02,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:03,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 133.94260 ± 115.243
2025-08-07 11:29:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [305.18933, 94.20303, 17.331879, 4.306776, 85.65795, 187.25777, 91.945984, 350.33026, 13.150362, 190.05261]
2025-08-07 11:29:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [163.0, 85.0, 42.0, 14.0, 73.0, 157.0, 80.0, 461.0, 28.0, 153.0]
2025-08-07 11:29:03,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 53 seconds)
2025-08-07 11:30:42,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:43,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 101.74245 ± 72.893
2025-08-07 11:30:43,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [91.815475, -0.48675343, 107.16065, 269.39642, 126.47796, 94.90175, 71.95341, 149.61336, -1.0892208, 107.68141]
2025-08-07 11:30:43,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [84.0, 12.0, 79.0, 172.0, 89.0, 73.0, 66.0, 133.0, 12.0, 114.0]
2025-08-07 11:30:43,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 4 minutes, 42 seconds)
2025-08-07 11:32:31,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 222.51982 ± 150.237
2025-08-07 11:32:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [294.83472, 132.7304, 526.56714, 3.5548124, 179.90742, 5.0412045, 309.25076, 215.89973, 211.4827, 345.92944]
2025-08-07 11:32:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 89.0, 462.0, 15.0, 101.0, 17.0, 194.0, 111.0, 137.0, 141.0]
2025-08-07 11:32:32,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (222.52) for latency MM1Queue_a033_s075
2025-08-07 11:32:32,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 5 minutes, 45 seconds)
2025-08-07 11:34:18,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:19,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 78.07608 ± 72.489
2025-08-07 11:34:19,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [185.73409, 87.382256, -3.132399, 120.32968, 93.73092, 4.0541496, 212.05922, 30.049217, 46.855305, 3.698444]
2025-08-07 11:34:19,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [102.0, 73.0, 13.0, 91.0, 64.0, 15.0, 132.0, 43.0, 50.0, 14.0]
2025-08-07 11:34:19,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 19 seconds)
2025-08-07 11:36:06,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:07,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 92.57471 ± 96.671
2025-08-07 11:36:07,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [15.256486, 0.9821188, 146.40913, 311.61127, 91.46889, 207.39357, 76.63533, 20.219255, 52.045933, 3.7251005]
2025-08-07 11:36:07,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 17.0, 89.0, 183.0, 71.0, 89.0, 82.0, 27.0, 70.0, 16.0]
2025-08-07 11:36:07,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2025-08-07 11:37:53,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:54,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 257.99640 ± 139.120
2025-08-07 11:37:54,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [319.94504, 508.07034, 428.702, 77.56635, 17.756727, 314.5932, 209.7311, 221.16783, 253.62083, 228.81056]
2025-08-07 11:37:54,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 232.0, 208.0, 92.0, 30.0, 150.0, 108.0, 165.0, 164.0, 132.0]
2025-08-07 11:37:54,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (258.00) for latency MM1Queue_a033_s075
2025-08-07 11:37:54,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 7 minutes, 24 seconds)
2025-08-07 11:39:42,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:43,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 246.10983 ± 216.106
2025-08-07 11:39:43,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [213.2062, 738.3898, 1.8823259, 275.89478, 435.69565, 2.0826685, 295.64423, 297.81778, 0.52156043, 199.96352]
2025-08-07 11:39:43,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 421.0, 17.0, 154.0, 327.0, 15.0, 219.0, 155.0, 13.0, 101.0]
2025-08-07 11:39:43,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 7 minutes, 51 seconds)
2025-08-07 11:41:31,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:34,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 308.60312 ± 331.370
2025-08-07 11:41:34,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [89.24026, 89.47, 669.18774, 1.4014944, 281.44443, 184.56691, 233.21466, 374.30563, 32.806145, 1130.394]
2025-08-07 11:41:34,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 104.0, 446.0, 13.0, 148.0, 138.0, 269.0, 251.0, 43.0, 999.0]
2025-08-07 11:41:34,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (308.60) for latency MM1Queue_a033_s075
2025-08-07 11:41:34,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 6 minutes, 26 seconds)
2025-08-07 11:43:21,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:22,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 162.73376 ± 125.335
2025-08-07 11:43:22,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [251.63924, 296.8725, 246.02924, 2.6738963, 37.788548, 3.7070684, 288.3761, 6.762456, 204.88216, 288.6064]
2025-08-07 11:43:22,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 222.0, 179.0, 16.0, 66.0, 15.0, 214.0, 34.0, 152.0, 134.0]
2025-08-07 11:43:22,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 5 minutes, 2 seconds)
2025-08-07 11:45:10,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:11,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 161.63028 ± 166.835
2025-08-07 11:45:11,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [159.3752, 207.00142, 1.9822379, 269.14044, 476.43823, 1.8428203, 89.85267, 3.7882802, 2.1564808, 404.72495]
2025-08-07 11:45:11,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 99.0, 16.0, 152.0, 241.0, 13.0, 119.0, 14.0, 16.0, 271.0]
2025-08-07 11:45:11,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 3 minutes, 29 seconds)
2025-08-07 11:46:58,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:01,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 262.49026 ± 167.393
2025-08-07 11:47:01,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [313.5393, 6.3467736, 375.58408, 391.70575, 157.40614, 415.73483, 108.92126, 366.79617, 6.8846927, 481.98364]
2025-08-07 11:47:01,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 16.0, 429.0, 173.0, 152.0, 288.0, 133.0, 203.0, 34.0, 432.0]
2025-08-07 11:47:01,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 2 minutes, 3 seconds)
2025-08-07 11:48:48,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 181.50792 ± 213.891
2025-08-07 11:48:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.065158, 95.84155, 55.417446, 142.1697, 198.39688, 60.733692, 361.21606, 3.0891273, 144.86516, 748.2844]
2025-08-07 11:48:50,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 114.0, 56.0, 146.0, 85.0, 105.0, 253.0, 13.0, 122.0, 607.0]
2025-08-07 11:48:50,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 seconds)
2025-08-07 11:50:38,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:40,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 186.12279 ± 145.296
2025-08-07 11:50:40,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [319.85825, -0.5238258, 230.55203, 40.96606, 390.23547, 29.63799, 249.82407, 400.18903, 52.490856, 147.998]
2025-08-07 11:50:40,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 14.0, 142.0, 108.0, 170.0, 121.0, 138.0, 235.0, 54.0, 133.0]
2025-08-07 11:50:40,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 58 minutes, 10 seconds)
2025-08-07 11:52:25,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:27,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 234.18668 ± 185.276
2025-08-07 11:52:27,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [9.377695, 34.388134, 421.41156, 426.46585, 181.66924, 3.8256054, 150.2974, 304.0953, 570.3507, 239.98546]
2025-08-07 11:52:27,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 41.0, 213.0, 211.0, 126.0, 15.0, 99.0, 151.0, 259.0, 139.0]
2025-08-07 11:52:27,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 56 minutes, 12 seconds)
2025-08-07 11:54:14,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 172.59961 ± 152.637
2025-08-07 11:54:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [38.072037, 1.1318487, 295.99945, 242.1532, 268.20392, 52.87024, 414.26825, 42.738476, 3.135772, 367.42294]
2025-08-07 11:54:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [48.0, 15.0, 157.0, 161.0, 140.0, 66.0, 221.0, 47.0, 16.0, 199.0]
2025-08-07 11:54:15,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 54 minutes, 14 seconds)
2025-08-07 11:56:03,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:04,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 171.12755 ± 180.548
2025-08-07 11:56:04,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [31.437365, 3.1654181, 349.13702, 23.08384, 41.591896, 14.407536, 325.37363, 350.93176, 61.617397, 510.52957]
2025-08-07 11:56:04,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 12.0, 178.0, 61.0, 50.0, 50.0, 142.0, 174.0, 106.0, 376.0]
2025-08-07 11:56:04,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 52 minutes, 16 seconds)
2025-08-07 11:57:52,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:53,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 189.98300 ± 136.056
2025-08-07 11:57:53,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.6314397, 101.99307, 207.76, 235.8731, 251.22133, 43.05092, 311.81793, 3.916908, 384.4402, 356.12512]
2025-08-07 11:57:53,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 121.0, 112.0, 117.0, 115.0, 49.0, 156.0, 27.0, 174.0, 189.0]
2025-08-07 11:57:53,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 50 minutes, 35 seconds)
2025-08-07 11:59:41,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:43,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 251.35367 ± 115.873
2025-08-07 11:59:43,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [202.4411, 263.66772, 53.445934, 210.38077, 71.373215, 407.92642, 370.16452, 387.06046, 288.5287, 258.54785]
2025-08-07 11:59:43,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 141.0, 89.0, 119.0, 69.0, 324.0, 203.0, 194.0, 189.0, 132.0]
2025-08-07 11:59:43,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 48 minutes, 45 seconds)
2025-08-07 12:01:31,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:33,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 186.22278 ± 186.204
2025-08-07 12:01:33,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [471.66205, 6.2887993, 166.28773, 3.462294, 362.74252, 317.31064, 3.3890796, 31.126326, 43.685886, 456.27243]
2025-08-07 12:01:33,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [308.0, 16.0, 145.0, 13.0, 150.0, 152.0, 16.0, 43.0, 89.0, 244.0]
2025-08-07 12:01:33,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 47 minutes, 20 seconds)
2025-08-07 12:03:20,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:22,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 169.22940 ± 165.910
2025-08-07 12:03:22,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.3481293, 53.842113, 363.21753, 67.92551, 93.27348, 349.51138, 0.61842686, 3.1597893, 346.7688, 413.6289]
2025-08-07 12:03:22,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 56.0, 249.0, 58.0, 106.0, 217.0, 14.0, 17.0, 169.0, 203.0]
2025-08-07 12:03:22,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 45 minutes, 37 seconds)
2025-08-07 12:05:10,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:10,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 33.28367 ± 65.511
2025-08-07 12:05:10,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.9380393, 37.70794, -0.37532824, 11.32877, 4.86536, 1.442778, 0.36592373, 224.13445, 6.1201296, 46.30868]
2025-08-07 12:05:10,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 46.0, 10.0, 28.0, 17.0, 13.0, 12.0, 116.0, 16.0, 47.0]
2025-08-07 12:05:10,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 43 minutes, 48 seconds)
2025-08-07 12:06:58,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:59,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 137.75711 ± 134.967
2025-08-07 12:06:59,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [316.3463, 65.1406, 211.0965, 29.40452, 0.64186114, 72.367294, 19.183722, 5.6083984, 339.15402, 318.62784]
2025-08-07 12:06:59,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 81.0, 145.0, 39.0, 25.0, 63.0, 46.0, 16.0, 181.0, 130.0]
2025-08-07 12:06:59,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 41 minutes, 53 seconds)
2025-08-07 12:08:46,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:48,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 290.88828 ± 163.421
2025-08-07 12:08:48,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [611.09576, 226.52217, 344.9518, 34.62126, 446.38474, 344.6041, 238.61232, 43.35373, 304.8748, 313.86227]
2025-08-07 12:08:48,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 124.0, 157.0, 41.0, 236.0, 170.0, 130.0, 81.0, 171.0, 160.0]
2025-08-07 12:08:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 39 minutes, 49 seconds)
2025-08-07 12:10:37,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:39,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 231.89714 ± 285.104
2025-08-07 12:10:39,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [113.220474, 290.96625, 908.242, 20.421934, 442.51807, 477.44006, 1.5520444, 63.454662, 2.2831066, -1.1272637]
2025-08-07 12:10:39,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [104.0, 134.0, 443.0, 63.0, 197.0, 274.0, 12.0, 178.0, 14.0, 16.0]
2025-08-07 12:10:39,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 38 minutes, 17 seconds)
2025-08-07 12:12:25,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:26,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 157.36203 ± 186.122
2025-08-07 12:12:26,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.3636858, 298.1245, 5.0162187, 382.2496, 64.475204, 1.9804777, 1.521181, 310.27917, 1.8276116, 508.51004]
2025-08-07 12:12:26,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 134.0, 16.0, 198.0, 164.0, 15.0, 17.0, 153.0, 14.0, 313.0]
2025-08-07 12:12:26,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 36 minutes, 9 seconds)
2025-08-07 12:14:14,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:15,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 205.09961 ± 142.075
2025-08-07 12:14:15,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [409.0717, 224.95384, 58.70006, 129.42682, 5.0519176, 268.46634, 160.89938, 39.882168, 385.11172, 369.43237]
2025-08-07 12:14:15,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [235.0, 148.0, 58.0, 141.0, 14.0, 146.0, 265.0, 48.0, 213.0, 184.0]
2025-08-07 12:14:15,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 34 minutes, 26 seconds)
2025-08-07 12:16:03,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 290.26221 ± 187.680
2025-08-07 12:16:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [291.8544, -2.43923, 279.5201, 259.73776, 584.08136, 4.0836463, 302.51382, 422.53003, 558.4181, 202.32217]
2025-08-07 12:16:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 13.0, 139.0, 130.0, 265.0, 17.0, 167.0, 289.0, 254.0, 128.0]
2025-08-07 12:16:05,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 32 minutes, 44 seconds)
2025-08-07 12:17:52,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:53,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 124.79814 ± 152.757
2025-08-07 12:17:53,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.4831176, 228.05838, 5.5125227, 0.60188144, 370.83765, 6.8195148, 1.3614396, 351.6733, 275.2681, 2.3656242]
2025-08-07 12:17:53,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 135.0, 17.0, 13.0, 199.0, 26.0, 12.0, 153.0, 169.0, 14.0]
2025-08-07 12:17:53,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 30 minutes, 50 seconds)
2025-08-07 12:19:40,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:41,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 173.82936 ± 202.649
2025-08-07 12:19:41,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [390.88766, 11.416865, 10.719573, 28.102827, 0.6848295, 28.252516, 217.81296, 41.450844, 500.0146, 508.9509]
2025-08-07 12:19:41,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [280.0, 24.0, 27.0, 49.0, 16.0, 32.0, 112.0, 50.0, 265.0, 284.0]
2025-08-07 12:19:41,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 28 minutes, 32 seconds)
2025-08-07 12:21:30,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:31,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 172.97812 ± 114.504
2025-08-07 12:21:31,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [193.33769, 175.73314, 32.260853, 10.780797, 275.88586, 233.34842, 11.076643, 167.89684, 270.45258, 359.00836]
2025-08-07 12:21:31,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [109.0, 148.0, 62.0, 38.0, 157.0, 138.0, 53.0, 164.0, 155.0, 197.0]
2025-08-07 12:21:31,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 27 minutes, 12 seconds)
2025-08-07 12:23:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:19,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 256.60077 ± 164.323
2025-08-07 12:23:19,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.2866373, 345.13245, 250.06706, 293.48358, 600.5426, 292.6849, 175.13416, 293.50742, -1.11874, 312.28748]
2025-08-07 12:23:19,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 174.0, 179.0, 164.0, 267.0, 169.0, 106.0, 148.0, 13.0, 140.0]
2025-08-07 12:23:19,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 25 minutes, 7 seconds)
2025-08-07 12:25:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:07,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 223.94980 ± 181.976
2025-08-07 12:25:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [516.0251, 5.8640466, 287.79822, 303.4671, 1.6155596, 182.91249, 0.64009047, 516.66754, 233.96237, 190.54547]
2025-08-07 12:25:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [309.0, 17.0, 151.0, 156.0, 14.0, 116.0, 13.0, 284.0, 124.0, 104.0]
2025-08-07 12:25:07,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 23 minutes, 5 seconds)
2025-08-07 12:26:54,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:56,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 173.48659 ± 193.358
2025-08-07 12:26:56,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [231.60959, 587.63245, 355.72214, 280.99667, -1.9804934, 4.7544713, 265.17105, 6.7322135, 1.5599242, 2.6677544]
2025-08-07 12:26:56,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 346.0, 186.0, 157.0, 16.0, 15.0, 141.0, 27.0, 15.0, 16.0]
2025-08-07 12:26:56,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 21 minutes, 24 seconds)
2025-08-07 12:28:44,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:46,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 230.84619 ± 132.750
2025-08-07 12:28:46,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [393.28125, 384.97433, 3.8103209, 238.85219, 285.27438, 231.14162, 1.7740599, 362.49307, 213.5726, 193.28798]
2025-08-07 12:28:46,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 188.0, 16.0, 114.0, 195.0, 160.0, 13.0, 148.0, 138.0, 154.0]
2025-08-07 12:28:46,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 19 minutes, 56 seconds)
2025-08-07 12:30:31,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:32,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 147.82413 ± 158.042
2025-08-07 12:30:32,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [6.1757083, 35.68274, 1.4673305, 301.98163, 1.4872899, 192.68407, 478.0599, 280.61087, 177.10175, 2.9900918]
2025-08-07 12:30:32,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 78.0, 12.0, 182.0, 16.0, 135.0, 240.0, 138.0, 90.0, 14.0]
2025-08-07 12:30:32,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 17 minutes, 34 seconds)
2025-08-07 12:32:17,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:19,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 230.86458 ± 174.051
2025-08-07 12:32:19,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [393.7363, 285.0602, 1.0354732, 339.42447, 2.1628876, 88.34595, 323.88144, 566.5306, 194.12503, 114.34342]
2025-08-07 12:32:19,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [180.0, 131.0, 17.0, 176.0, 16.0, 99.0, 162.0, 243.0, 111.0, 137.0]
2025-08-07 12:32:19,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 15 minutes, 36 seconds)
2025-08-07 12:34:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:06,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 337.86240 ± 221.275
2025-08-07 12:34:06,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [625.9613, 479.03738, 2.1095037, 508.24545, 444.09735, 0.033007592, 390.87326, 511.6383, 45.610786, 371.01764]
2025-08-07 12:34:06,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [278.0, 331.0, 13.0, 271.0, 272.0, 10.0, 182.0, 279.0, 43.0, 175.0]
2025-08-07 12:34:06,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (337.86) for latency MM1Queue_a033_s075
2025-08-07 12:34:06,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 13 minutes, 40 seconds)
2025-08-07 12:35:50,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:35:51,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 217.15974 ± 185.336
2025-08-07 12:35:51,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [5.996878, 301.21646, 379.28723, 6.0417657, 459.30457, 305.19324, 10.719578, 6.998854, 477.56442, 219.27437]
2025-08-07 12:35:51,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 152.0, 180.0, 16.0, 206.0, 169.0, 30.0, 16.0, 221.0, 112.0]
2025-08-07 12:35:51,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 24 seconds)
2025-08-07 12:37:35,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:37:37,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 233.44633 ± 189.605
2025-08-07 12:37:37,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [45.36812, 443.53998, 6.297676, 1.2176175, 329.36935, 404.95392, 2.2439396, 237.4484, 473.94324, 390.08112]
2025-08-07 12:37:37,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [42.0, 223.0, 18.0, 17.0, 209.0, 187.0, 12.0, 113.0, 259.0, 200.0]
2025-08-07 12:37:37,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 8 minutes, 59 seconds)
2025-08-07 12:39:21,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:39:23,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 299.08960 ± 252.251
2025-08-07 12:39:23,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [440.401, 35.535328, 4.9665813, 300.37833, 525.91547, 721.10803, 504.91083, 4.926225, 12.664633, 440.0899]
2025-08-07 12:39:23,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [211.0, 82.0, 18.0, 142.0, 294.0, 338.0, 237.0, 18.0, 32.0, 223.0]
2025-08-07 12:39:23,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 9 seconds)
2025-08-07 12:41:07,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:41:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 371.07986 ± 155.225
2025-08-07 12:41:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.0356445, 467.81442, 364.30237, 310.40268, 497.25098, 274.19684, 286.75467, 455.85178, 469.87866, 580.3105]
2025-08-07 12:41:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 248.0, 169.0, 165.0, 243.0, 138.0, 201.0, 209.0, 196.0, 288.0]
2025-08-07 12:41:09,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (371.08) for latency MM1Queue_a033_s075
2025-08-07 12:41:09,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 5 minutes, 24 seconds)
2025-08-07 12:42:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 267.74500 ± 244.595
2025-08-07 12:42:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [310.53305, 731.37274, 577.6526, 170.90097, 320.86105, -0.0028719276, 456.28378, 98.47677, 4.4740143, 6.897695]
2025-08-07 12:42:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 385.0, 317.0, 91.0, 140.0, 12.0, 229.0, 200.0, 15.0, 16.0]
2025-08-07 12:42:56,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 37 seconds)
2025-08-07 12:44:39,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:41,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 350.32123 ± 188.008
2025-08-07 12:44:41,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [313.39554, 251.42398, 277.76468, 108.0784, 506.51852, 642.1375, 419.1865, 428.5067, 6.494001, 549.7068]
2025-08-07 12:44:41,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 128.0, 130.0, 175.0, 224.0, 292.0, 160.0, 212.0, 30.0, 256.0]
2025-08-07 12:44:41,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 1 minute, 50 seconds)
2025-08-07 12:46:27,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:28,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 325.82709 ± 219.885
2025-08-07 12:46:28,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [398.62128, 514.3543, 2.0843406, 429.65585, 534.47687, -0.6981142, 11.867747, 495.1433, 548.689, 324.0764]
2025-08-07 12:46:28,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 225.0, 16.0, 160.0, 249.0, 13.0, 86.0, 221.0, 255.0, 151.0]
2025-08-07 12:46:28,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 16 seconds)
2025-08-07 12:48:14,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:48:15,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 311.79547 ± 257.335
2025-08-07 12:48:15,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [540.4173, 6.3968797, 389.4596, 557.244, -2.0808747, 594.82837, 3.4628, 594.1306, 18.25183, 415.84412]
2025-08-07 12:48:15,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [218.0, 16.0, 168.0, 250.0, 9.0, 267.0, 15.0, 261.0, 58.0, 163.0]
2025-08-07 12:48:15,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 58 minutes, 35 seconds)
2025-08-07 12:49:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:59,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 167.46091 ± 164.484
2025-08-07 12:49:59,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.073825054, 3.1034284, 254.74152, 394.5222, 396.19775, 4.4617276, 1.0753063, 185.3601, 366.58994, 68.48311]
2025-08-07 12:49:59,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 17.0, 126.0, 190.0, 173.0, 16.0, 12.0, 95.0, 169.0, 145.0]
2025-08-07 12:49:59,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 29 seconds)
2025-08-07 12:51:42,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 388.46106 ± 168.140
2025-08-07 12:51:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [469.86887, 8.322049, 393.82233, 419.4772, 637.33374, 416.19604, 472.70078, 475.82758, 161.59337, 429.46878]
2025-08-07 12:51:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [211.0, 17.0, 148.0, 165.0, 308.0, 173.0, 204.0, 203.0, 87.0, 187.0]
2025-08-07 12:51:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (388.46) for latency MM1Queue_a033_s075
2025-08-07 12:51:44,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 31 seconds)
2025-08-07 12:53:28,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:53:30,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 310.14725 ± 201.719
2025-08-07 12:53:30,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [550.4033, 370.41174, 3.8535135, 484.84125, 2.8354383, 209.66699, 351.65363, 431.54602, 568.7384, 127.5221]
2025-08-07 12:53:30,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 150.0, 14.0, 239.0, 17.0, 105.0, 146.0, 201.0, 250.0, 80.0]
2025-08-07 12:53:30,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 50 seconds)
2025-08-07 12:55:15,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:55:17,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 401.47778 ± 138.485
2025-08-07 12:55:17,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [494.11148, 515.39417, 34.232517, 408.6885, 341.98282, 341.02707, 414.03534, 453.47476, 547.34875, 464.4824]
2025-08-07 12:55:17,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [243.0, 232.0, 71.0, 161.0, 149.0, 151.0, 194.0, 166.0, 282.0, 210.0]
2025-08-07 12:55:17,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (401.48) for latency MM1Queue_a033_s075
2025-08-07 12:55:17,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 4 seconds)
2025-08-07 12:56:58,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:57:00,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 343.24673 ± 191.286
2025-08-07 12:57:00,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [540.7621, 316.95895, 409.39514, 573.6461, 325.32285, 2.0462835, 342.0176, 405.40216, -0.15372287, 517.0697]
2025-08-07 12:57:00,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [263.0, 246.0, 160.0, 305.0, 136.0, 13.0, 145.0, 180.0, 14.0, 241.0]
2025-08-07 12:57:00,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes)
2025-08-07 12:58:43,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:58:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 464.24619 ± 205.470
2025-08-07 12:58:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [194.9478, 561.68646, 803.01324, 760.91943, 433.53143, 476.0575, 388.47424, 106.106384, 492.31235, 425.41324]
2025-08-07 12:58:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [89.0, 237.0, 309.0, 284.0, 174.0, 211.0, 146.0, 73.0, 199.0, 180.0]
2025-08-07 12:58:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (464.25) for latency MM1Queue_a033_s075
2025-08-07 12:58:45,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 23 seconds)
2025-08-07 13:00:30,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:00:31,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 273.68912 ± 255.780
2025-08-07 13:00:31,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [780.2552, 431.36057, 0.9006402, 7.703785, 272.43005, 276.52112, 6.2124586, 473.7285, 4.8753486, 482.9035]
2025-08-07 13:00:31,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [473.0, 159.0, 15.0, 18.0, 124.0, 117.0, 15.0, 188.0, 15.0, 188.0]
2025-08-07 13:00:31,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 41 seconds)
2025-08-07 13:02:14,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:02:16,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 352.85669 ± 181.312
2025-08-07 13:02:16,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [514.61847, 516.3302, 387.5872, 478.0538, 285.1261, 28.073011, 428.41302, 412.3411, 0.5499048, 477.47418]
2025-08-07 13:02:16,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 215.0, 148.0, 187.0, 117.0, 78.0, 159.0, 146.0, 16.0, 195.0]
2025-08-07 13:02:16,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 50 seconds)
2025-08-07 13:04:00,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:04:01,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 310.63525 ± 208.106
2025-08-07 13:04:01,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [351.7837, 476.7335, 75.420135, 368.23834, 5.8191595, 641.98517, 473.47028, 449.70667, 4.1546993, 259.04102]
2025-08-07 13:04:01,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 204.0, 107.0, 139.0, 16.0, 361.0, 183.0, 173.0, 14.0, 107.0]
2025-08-07 13:04:01,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 57 seconds)
2025-08-07 13:05:44,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:05:46,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 254.13106 ± 173.676
2025-08-07 13:05:46,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [363.35178, 47.839844, 366.33545, -0.45138744, 143.69194, 3.8365557, 433.52313, 362.95645, 376.8199, 443.40704]
2025-08-07 13:05:46,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 90.0, 142.0, 17.0, 95.0, 14.0, 166.0, 132.0, 142.0, 168.0]
2025-08-07 13:05:46,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 16 seconds)
2025-08-07 13:07:29,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:07:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 311.91437 ± 206.471
2025-08-07 13:07:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [118.37551, 503.1362, 518.86566, 476.9943, 495.24863, 7.036054, 156.9081, 3.2776217, 500.70776, 338.59387]
2025-08-07 13:07:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [67.0, 212.0, 212.0, 193.0, 193.0, 17.0, 84.0, 17.0, 194.0, 140.0]
2025-08-07 13:07:31,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 32 seconds)
2025-08-07 13:09:14,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:09:17,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 495.20645 ± 356.182
2025-08-07 13:09:17,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [325.33267, 1192.0404, 234.61374, 926.21704, 634.67145, 21.38238, -2.5186646, 561.786, 496.8765, 561.6631]
2025-08-07 13:09:17,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 732.0, 106.0, 362.0, 278.0, 66.0, 14.0, 234.0, 201.0, 229.0]
2025-08-07 13:09:17,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (495.21) for latency MM1Queue_a033_s075
2025-08-07 13:09:17,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 48 seconds)
2025-08-07 13:11:01,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:11:03,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 326.25217 ± 231.022
2025-08-07 13:11:03,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.5502615, 535.672, 614.7971, 13.652139, -1.7342525, 283.26666, 550.02155, 324.2298, 515.27356, 423.79315]
2025-08-07 13:11:03,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 216.0, 318.0, 47.0, 15.0, 121.0, 228.0, 134.0, 209.0, 177.0]
2025-08-07 13:11:03,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 9 seconds)
2025-08-07 13:12:46,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:12:48,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 388.57516 ± 321.504
2025-08-07 13:12:48,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [810.4072, 657.70685, 687.48694, 299.17404, 1.7004665, 750.71436, 562.8448, 3.8377814, 108.05741, 3.8220794]
2025-08-07 13:12:48,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [377.0, 249.0, 284.0, 128.0, 13.0, 317.0, 223.0, 18.0, 202.0, 15.0]
2025-08-07 13:12:48,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 19 seconds)
2025-08-07 13:14:33,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:14:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 638.58807 ± 290.606
2025-08-07 13:14:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [500.3603, 819.8698, 1022.697, 752.90356, 581.0296, 404.1798, 709.2079, 1052.7108, 18.842705, 524.07886]
2025-08-07 13:14:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 360.0, 485.0, 287.0, 222.0, 151.0, 260.0, 396.0, 49.0, 238.0]
2025-08-07 13:14:36,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (638.59) for latency MM1Queue_a033_s075
2025-08-07 13:14:36,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 49 seconds)
2025-08-07 13:16:19,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:16:22,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 488.90479 ± 212.691
2025-08-07 13:16:22,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [369.44385, 709.7505, 310.70767, 844.3663, 350.07443, 437.34866, 361.65353, 356.0504, 858.2735, 291.37903]
2025-08-07 13:16:22,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 286.0, 134.0, 482.0, 135.0, 173.0, 139.0, 261.0, 366.0, 130.0]
2025-08-07 13:16:22,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 4 seconds)
2025-08-07 13:18:07,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:18:09,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 353.63782 ± 175.569
2025-08-07 13:18:09,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [400.14743, 510.1657, 421.69598, 83.16484, 419.24088, 469.97937, -2.470631, 499.59024, 504.2892, 230.57509]
2025-08-07 13:18:09,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [157.0, 238.0, 156.0, 131.0, 164.0, 200.0, 15.0, 199.0, 196.0, 103.0]
2025-08-07 13:18:09,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 21 seconds)
2025-08-07 13:19:50,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:19:53,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 469.01651 ± 313.346
2025-08-07 13:19:53,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [691.5443, 401.65897, 1196.0444, 4.3877783, 323.33942, 148.73338, 327.42795, 624.45447, 400.42447, 572.14996]
2025-08-07 13:19:53,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 170.0, 505.0, 17.0, 191.0, 191.0, 150.0, 275.0, 161.0, 235.0]
2025-08-07 13:19:53,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 28 seconds)
2025-08-07 13:21:35,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:21:37,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 431.92447 ± 292.523
2025-08-07 13:21:37,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [617.3756, 704.92615, 381.37546, 660.74164, 0.07673141, 657.55707, 8.222916, 609.8284, 3.2450287, 675.89557]
2025-08-07 13:21:37,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [266.0, 339.0, 142.0, 222.0, 12.0, 236.0, 17.0, 232.0, 15.0, 227.0]
2025-08-07 13:21:37,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 42 seconds)
2025-08-07 13:23:24,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:23:28,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 691.82080 ± 431.837
2025-08-07 13:23:28,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.7768368, 384.73242, 1165.8369, 846.7831, 680.0332, 1487.0425, 119.74644, 654.5901, 970.03723, 607.63]
2025-08-07 13:23:28,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 156.0, 477.0, 335.0, 310.0, 568.0, 68.0, 267.0, 384.0, 274.0]
2025-08-07 13:23:28,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1226 [INFO]: New best (691.82) for latency MM1Queue_a033_s075
2025-08-07 13:23:28,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 2 seconds)
2025-08-07 13:25:11,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:25:13,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 453.01190 ± 375.559
2025-08-07 13:25:13,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [763.0015, 5.1518598, 1.9804059, 795.57666, 928.0084, 725.89624, 436.7224, 56.44455, 815.7591, 1.5781574]
2025-08-07 13:25:13,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 16.0, 12.0, 332.0, 392.0, 292.0, 173.0, 98.0, 330.0, 12.0]
2025-08-07 13:25:13,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 14 seconds)
2025-08-07 13:26:54,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 554.83215 ± 318.794
2025-08-07 13:26:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [918.20807, 737.55304, -0.61513466, 670.17676, 416.25012, 3.9477093, 406.66846, 825.7533, 735.78723, 834.592]
2025-08-07 13:26:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [404.0, 315.0, 15.0, 284.0, 152.0, 17.0, 156.0, 321.0, 276.0, 393.0]
2025-08-07 13:26:57,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 22 seconds)
2025-08-07 13:28:44,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:28:45,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 342.84137 ± 282.619
2025-08-07 13:28:45,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.62372, 790.7289, 214.52666, 1.2267467, 378.84103, 336.66385, 485.60785, -0.61826015, 421.53253, 795.28064]
2025-08-07 13:28:45,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 279.0, 103.0, 17.0, 152.0, 140.0, 208.0, 13.0, 155.0, 329.0]
2025-08-07 13:28:45,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 45 seconds)
2025-08-07 13:30:29,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:30:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 566.57715 ± 409.195
2025-08-07 13:30:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [191.73618, 571.5371, 614.3284, 3.201415, 364.0789, 633.6502, 1177.3944, 1284.5154, 753.8295, 71.49998]
2025-08-07 13:30:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [97.0, 279.0, 259.0, 14.0, 140.0, 237.0, 527.0, 467.0, 287.0, 110.0]
2025-08-07 13:30:32,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 1 second)
2025-08-07 13:32:14,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:32:16,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 365.98749 ± 311.274
2025-08-07 13:32:16,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.2445016, 576.297, 463.4237, 69.401, 8.394797, 362.96222, 549.5631, 3.0308077, 868.21344, 754.34436]
2025-08-07 13:32:16,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 230.0, 178.0, 110.0, 34.0, 133.0, 233.0, 16.0, 330.0, 275.0]
2025-08-07 13:32:16,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 4 seconds)
2025-08-07 13:34:02,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:34:05,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 579.17773 ± 396.155
2025-08-07 13:34:05,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [420.87274, 941.7742, 1097.9764, 1314.2148, 136.18257, 328.18152, 738.91974, 125.25126, 377.2438, 311.16037]
2025-08-07 13:34:05,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [163.0, 422.0, 431.0, 500.0, 74.0, 135.0, 316.0, 74.0, 147.0, 125.0]
2025-08-07 13:34:05,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 25 seconds)
2025-08-07 13:35:47,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:35:50,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 539.02454 ± 542.480
2025-08-07 13:35:50,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [382.11725, 433.02753, 0.030637002, 799.3106, 5.4619, 375.8236, 5.759506, 646.7907, 843.1441, 1898.7795]
2025-08-07 13:35:50,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 166.0, 13.0, 260.0, 17.0, 147.0, 18.0, 261.0, 312.0, 746.0]
2025-08-07 13:35:50,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 39 seconds)
2025-08-07 13:37:35,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:37:37,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 582.41119 ± 405.367
2025-08-07 13:37:37,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [137.72006, 3.93988, 377.12842, 1043.5547, 745.2497, 4.922888, 586.3838, 992.5976, 1094.7914, 837.82306]
2025-08-07 13:37:37,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [75.0, 15.0, 150.0, 376.0, 266.0, 15.0, 218.0, 490.0, 433.0, 324.0]
2025-08-07 13:37:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 52 seconds)
2025-08-07 13:39:22,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:39:24,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 398.56454 ± 447.400
2025-08-07 13:39:24,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [675.763, 655.99225, 1410.3876, 417.82227, 12.362961, 71.72205, 730.8294, 4.8031826, 4.755346, 1.2076187]
2025-08-07 13:39:24,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 225.0, 583.0, 149.0, 74.0, 131.0, 267.0, 18.0, 15.0, 17.0]
2025-08-07 13:39:24,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 5 seconds)
2025-08-07 13:41:09,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:41:11,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 445.28036 ± 369.528
2025-08-07 13:41:11,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [889.339, 753.2608, 7.8357067, 897.8612, 454.7921, 5.3490868, 200.0266, 908.095, 1.2818601, 334.96243]
2025-08-07 13:41:11,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [527.0, 298.0, 16.0, 327.0, 164.0, 15.0, 104.0, 343.0, 17.0, 128.0]
2025-08-07 13:41:11,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 21 seconds)
2025-08-07 13:42:57,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:42:59,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 443.33829 ± 456.002
2025-08-07 13:42:59,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [605.5983, 402.87167, 207.06148, 1287.2018, 962.6179, 2.947106, 2.9285426, 942.90765, 1.4161094, 17.832184]
2025-08-07 13:42:59,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 153.0, 98.0, 467.0, 366.0, 17.0, 15.0, 331.0, 15.0, 57.0]
2025-08-07 13:42:59,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 33 seconds)
2025-08-07 13:44:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:44:49,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 620.80273 ± 523.747
2025-08-07 13:44:49,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [22.971855, 317.1463, 1127.6343, 0.24876729, 705.0587, 1792.6241, 263.74417, 389.1092, 673.111, 916.37933]
2025-08-07 13:44:49,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [75.0, 128.0, 541.0, 12.0, 256.0, 631.0, 122.0, 142.0, 231.0, 345.0]
2025-08-07 13:44:49,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 47 seconds)
2025-08-07 13:46:28,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:46:30,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 417.94418 ± 314.431
2025-08-07 13:46:30,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [695.21594, 662.59406, 348.9678, 0.0130765075, 881.65, 740.8457, 3.0604239, 487.86075, 356.54318, 2.691048]
2025-08-07 13:46:30,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [226.0, 243.0, 135.0, 12.0, 313.0, 267.0, 13.0, 165.0, 134.0, 15.0]
2025-08-07 13:46:30,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-walker2d):1251 [DEBUG]: Training session finished
