2025-08-07 10:14:05,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:14:05,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:14:05,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x15341db83c10>}
2025-08-07 10:14:05,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 10:14:05,827 baseline-bpql-noiseperc15-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:14:05,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 10:14:05,846 baseline-bpql-noiseperc15-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 10:14:05,846 baseline-bpql-noiseperc15-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:14:07,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 10:14:07,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 10:15:54,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:55,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 276.85803 ± 88.501
2025-08-07 10:15:55,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [129.55804, 249.13197, 337.87146, 367.9449, 268.52072, 282.35358, 117.795586, 395.4038, 343.0792, 276.9212]
2025-08-07 10:15:55,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 49.0, 63.0, 69.0, 51.0, 54.0, 23.0, 77.0, 66.0, 53.0]
2025-08-07 10:15:55,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (276.86) for latency MM1Queue_a033_s075
2025-08-07 10:15:55,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 57 minutes, 35 seconds)
2025-08-07 10:17:51,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:52,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 363.91458 ± 107.481
2025-08-07 10:17:52,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [353.3947, 431.82843, 333.75095, 371.15155, 328.0124, 424.83917, 521.98175, 446.81857, 95.704, 331.66437]
2025-08-07 10:17:52,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 81.0, 60.0, 71.0, 61.0, 79.0, 101.0, 87.0, 19.0, 68.0]
2025-08-07 10:17:52,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (363.91) for latency MM1Queue_a033_s075
2025-08-07 10:17:52,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 36 seconds)
2025-08-07 10:19:47,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:48,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 235.03984 ± 139.912
2025-08-07 10:19:48,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [113.46684, 166.32764, 507.3328, 151.68579, 134.70757, 139.30685, 113.23874, 423.69003, 208.17482, 392.4675]
2025-08-07 10:19:48,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 32.0, 102.0, 29.0, 26.0, 27.0, 22.0, 90.0, 46.0, 85.0]
2025-08-07 10:19:48,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 3 minutes, 35 seconds)
2025-08-07 10:21:44,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:45,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 252.56157 ± 172.519
2025-08-07 10:21:45,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [150.96245, 282.44412, 125.176956, 118.04893, 141.31937, 130.37552, 426.0903, 635.77136, 412.26538, 103.16123]
2025-08-07 10:21:45,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 53.0, 24.0, 23.0, 27.0, 25.0, 78.0, 135.0, 79.0, 20.0]
2025-08-07 10:21:45,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 2 minutes, 58 seconds)
2025-08-07 10:23:40,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:40,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 157.10921 ± 68.643
2025-08-07 10:23:40,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [131.09462, 316.9943, 122.34889, 195.47539, 110.919334, 242.97772, 112.618835, 107.222305, 135.67941, 95.76129]
2025-08-07 10:23:40,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 65.0, 24.0, 37.0, 22.0, 47.0, 22.0, 21.0, 26.0, 19.0]
2025-08-07 10:23:40,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 1 minute, 23 seconds)
2025-08-07 10:25:35,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:36,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 216.47522 ± 102.262
2025-08-07 10:25:36,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [139.36801, 132.31412, 400.52, 130.39842, 156.69795, 122.2703, 316.87964, 344.53253, 285.96902, 135.80228]
2025-08-07 10:25:36,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 26.0, 76.0, 25.0, 30.0, 24.0, 57.0, 68.0, 57.0, 26.0]
2025-08-07 10:25:36,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 1 minute, 59 seconds)
2025-08-07 10:27:30,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 195.23491 ± 72.701
2025-08-07 10:27:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [118.46164, 112.528275, 150.86394, 259.218, 274.24683, 231.43114, 149.55338, 101.78028, 310.81848, 243.4472]
2025-08-07 10:27:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 29.0, 50.0, 55.0, 45.0, 29.0, 20.0, 63.0, 49.0]
2025-08-07 10:27:31,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 24 seconds)
2025-08-07 10:29:25,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:25,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 152.03513 ± 59.085
2025-08-07 10:29:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [145.98401, 142.83807, 133.98186, 198.14914, 310.88553, 122.23008, 118.10504, 108.87238, 143.14378, 96.16138]
2025-08-07 10:29:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 28.0, 26.0, 39.0, 60.0, 24.0, 23.0, 21.0, 28.0, 19.0]
2025-08-07 10:29:25,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 57 minutes)
2025-08-07 10:31:20,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:21,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 296.35501 ± 162.757
2025-08-07 10:31:21,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [314.93414, 107.881165, 95.928375, 373.35324, 439.89618, 211.44272, 138.07031, 318.86438, 656.138, 307.0416]
2025-08-07 10:31:21,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 21.0, 19.0, 71.0, 85.0, 44.0, 27.0, 59.0, 131.0, 61.0]
2025-08-07 10:31:21,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 54 minutes, 52 seconds)
2025-08-07 10:33:17,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 287.88693 ± 147.406
2025-08-07 10:33:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [284.15182, 526.75385, 107.96359, 102.36785, 219.03546, 307.68448, 457.23923, 420.4857, 95.22538, 357.96207]
2025-08-07 10:33:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 103.0, 21.0, 20.0, 44.0, 57.0, 85.0, 81.0, 19.0, 67.0]
2025-08-07 10:33:18,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 53 minutes, 29 seconds)
2025-08-07 10:35:16,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 327.29007 ± 138.281
2025-08-07 10:35:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [525.5065, 266.00308, 347.36148, 90.06569, 347.1079, 490.08194, 106.749084, 330.01398, 316.84998, 453.1609]
2025-08-07 10:35:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 50.0, 63.0, 18.0, 67.0, 92.0, 21.0, 62.0, 58.0, 87.0]
2025-08-07 10:35:17,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 29 seconds)
2025-08-07 10:37:14,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:15,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 253.26765 ± 143.200
2025-08-07 10:37:15,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [89.82556, 124.33014, 145.34735, 102.79805, 310.63013, 457.3627, 380.0731, 428.3894, 113.42119, 380.499]
2025-08-07 10:37:15,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 28.0, 20.0, 59.0, 95.0, 72.0, 93.0, 22.0, 79.0]
2025-08-07 10:37:15,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 25 seconds)
2025-08-07 10:39:11,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:12,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 272.67419 ± 130.868
2025-08-07 10:39:12,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [351.47992, 89.5353, 95.042694, 336.67416, 358.2927, 405.68805, 346.8146, 105.381714, 190.303, 447.52975]
2025-08-07 10:39:12,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 18.0, 19.0, 66.0, 69.0, 74.0, 65.0, 21.0, 36.0, 83.0]
2025-08-07 10:39:12,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 50 minutes, 12 seconds)
2025-08-07 10:41:08,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:09,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 301.93655 ± 133.803
2025-08-07 10:41:09,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [500.33826, 341.49783, 138.51965, 140.29071, 155.73453, 248.22182, 302.90707, 264.98593, 402.2787, 524.5911]
2025-08-07 10:41:09,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 63.0, 27.0, 27.0, 30.0, 53.0, 57.0, 52.0, 75.0, 101.0]
2025-08-07 10:41:09,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 35 seconds)
2025-08-07 10:43:05,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:06,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 237.40852 ± 134.281
2025-08-07 10:43:06,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [107.3081, 279.3234, 125.41812, 318.76382, 477.64078, 435.2042, 96.13741, 276.6585, 145.4497, 112.18125]
2025-08-07 10:43:06,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 52.0, 24.0, 60.0, 91.0, 81.0, 19.0, 51.0, 28.0, 22.0]
2025-08-07 10:43:06,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 31 seconds)
2025-08-07 10:45:03,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:03,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 236.49008 ± 150.837
2025-08-07 10:45:03,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.60349, 560.2597, 123.709656, 123.71208, 106.815735, 140.48409, 337.46848, 147.2984, 384.8117, 343.7375]
2025-08-07 10:45:03,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 119.0, 24.0, 24.0, 21.0, 27.0, 63.0, 28.0, 71.0, 67.0]
2025-08-07 10:45:03,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 10 seconds)
2025-08-07 10:47:00,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:00,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 186.76207 ± 93.362
2025-08-07 10:47:00,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [161.43936, 111.55319, 102.61546, 112.08563, 318.02545, 124.73936, 334.78452, 108.055725, 170.57555, 323.74646]
2025-08-07 10:47:00,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 22.0, 20.0, 22.0, 62.0, 24.0, 68.0, 21.0, 33.0, 61.0]
2025-08-07 10:47:00,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 41 minutes, 52 seconds)
2025-08-07 10:48:57,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:58,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 300.68951 ± 152.819
2025-08-07 10:48:58,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [481.8517, 470.81036, 124.941345, 128.69095, 312.53693, 127.00041, 129.98093, 365.9445, 352.53476, 512.6033]
2025-08-07 10:48:58,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 87.0, 24.0, 25.0, 58.0, 25.0, 25.0, 69.0, 65.0, 97.0]
2025-08-07 10:48:58,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 6 seconds)
2025-08-07 10:50:54,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:55,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 341.94431 ± 77.925
2025-08-07 10:50:55,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [500.75406, 372.3442, 367.1238, 373.4297, 335.52655, 171.61522, 367.20065, 306.65582, 324.02682, 300.76627]
2025-08-07 10:50:55,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 74.0, 67.0, 69.0, 62.0, 33.0, 67.0, 58.0, 63.0, 57.0]
2025-08-07 10:50:55,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 16 seconds)
2025-08-07 10:52:51,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:52,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 362.02939 ± 205.755
2025-08-07 10:52:52,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [151.92296, 128.6515, 360.75024, 114.74057, 404.26575, 435.26633, 838.8235, 526.40485, 328.73834, 330.72986]
2025-08-07 10:52:52,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 69.0, 22.0, 74.0, 80.0, 170.0, 98.0, 61.0, 61.0]
2025-08-07 10:52:52,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 24 seconds)
2025-08-07 10:54:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:50,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 271.44196 ± 125.310
2025-08-07 10:54:50,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [374.80402, 394.49075, 437.02966, 107.12142, 157.75838, 305.25208, 107.11106, 124.91177, 374.6963, 331.24402]
2025-08-07 10:54:50,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 74.0, 83.0, 21.0, 30.0, 55.0, 21.0, 24.0, 73.0, 62.0]
2025-08-07 10:54:50,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 22 seconds)
2025-08-07 10:56:46,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:47,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 327.02838 ± 116.051
2025-08-07 10:56:47,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [293.2071, 378.75235, 394.54675, 318.18103, 131.07938, 496.49142, 265.84512, 441.24243, 411.48996, 139.4481]
2025-08-07 10:56:47,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 74.0, 73.0, 59.0, 25.0, 93.0, 52.0, 82.0, 78.0, 27.0]
2025-08-07 10:56:47,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 40 seconds)
2025-08-07 10:58:44,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:45,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 328.58554 ± 142.628
2025-08-07 10:58:45,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [145.43634, 462.22876, 341.72604, 375.96185, 472.46423, 379.03696, 483.6317, 406.63678, 116.94545, 101.787155]
2025-08-07 10:58:45,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 99.0, 66.0, 69.0, 87.0, 72.0, 88.0, 76.0, 23.0, 20.0]
2025-08-07 10:58:45,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 46 seconds)
2025-08-07 11:00:42,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:43,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 358.71521 ± 169.866
2025-08-07 11:00:43,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [517.4878, 593.7052, 113.72568, 379.1177, 380.84048, 136.11667, 412.35803, 497.74442, 454.6802, 101.37586]
2025-08-07 11:00:43,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 109.0, 22.0, 67.0, 68.0, 26.0, 77.0, 93.0, 82.0, 20.0]
2025-08-07 11:00:43,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 51 seconds)
2025-08-07 11:02:40,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:40,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 165.26344 ± 80.564
2025-08-07 11:02:40,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [125.38612, 113.25016, 129.96922, 102.54247, 151.43253, 355.53342, 136.4818, 145.63245, 285.3876, 107.01864]
2025-08-07 11:02:40,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 25.0, 20.0, 29.0, 66.0, 26.0, 28.0, 54.0, 21.0]
2025-08-07 11:02:40,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes, 59 seconds)
2025-08-07 11:04:38,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:39,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 320.47821 ± 184.374
2025-08-07 11:04:39,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [413.27762, 119.09634, 352.93207, 741.4894, 389.41168, 134.96269, 150.20145, 149.95232, 443.42502, 310.03366]
2025-08-07 11:04:39,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 23.0, 66.0, 138.0, 79.0, 26.0, 29.0, 29.0, 93.0, 57.0]
2025-08-07 11:04:39,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes, 15 seconds)
2025-08-07 11:06:35,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:35,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 279.68414 ± 143.705
2025-08-07 11:06:35,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [124.660934, 427.5048, 504.53574, 114.57263, 288.14484, 101.268456, 430.68085, 354.5971, 134.31166, 316.56424]
2025-08-07 11:06:35,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 79.0, 92.0, 22.0, 53.0, 20.0, 80.0, 68.0, 26.0, 66.0]
2025-08-07 11:06:35,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 5 seconds)
2025-08-07 11:08:33,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:34,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 338.43427 ± 147.528
2025-08-07 11:08:34,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [288.10892, 522.38525, 539.8199, 89.630104, 180.35281, 455.84808, 365.36465, 409.9973, 375.5527, 157.28294]
2025-08-07 11:08:34,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 96.0, 96.0, 18.0, 35.0, 85.0, 66.0, 83.0, 71.0, 30.0]
2025-08-07 11:08:34,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 23 seconds)
2025-08-07 11:10:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:31,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 271.00830 ± 119.134
2025-08-07 11:10:31,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [151.17525, 392.18762, 352.30795, 348.8093, 126.28986, 373.79327, 309.30182, 133.31622, 106.36095, 416.5409]
2025-08-07 11:10:31,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 72.0, 65.0, 66.0, 24.0, 69.0, 58.0, 26.0, 21.0, 78.0]
2025-08-07 11:10:31,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 11 seconds)
2025-08-07 11:12:27,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:28,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 436.40298 ± 202.424
2025-08-07 11:12:28,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [342.79175, 666.1124, 781.46735, 516.65894, 583.19385, 166.21153, 466.50528, 108.78107, 437.05112, 295.25674]
2025-08-07 11:12:28,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 133.0, 146.0, 96.0, 107.0, 32.0, 85.0, 21.0, 82.0, 53.0]
2025-08-07 11:12:28,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (436.40) for latency MM1Queue_a033_s075
2025-08-07 11:12:28,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2025-08-07 11:14:25,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:26,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 292.41180 ± 169.673
2025-08-07 11:14:26,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [614.89606, 404.77426, 295.9061, 370.55206, 125.45079, 473.80258, 317.80728, 123.4373, 107.71181, 89.77972]
2025-08-07 11:14:26,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 77.0, 59.0, 69.0, 24.0, 88.0, 59.0, 24.0, 21.0, 18.0]
2025-08-07 11:14:26,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 15 minutes)
2025-08-07 11:16:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:21,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 280.80508 ± 163.173
2025-08-07 11:16:21,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [170.97917, 101.76786, 130.08138, 113.06288, 316.8983, 113.687645, 482.1835, 512.31335, 458.5034, 408.57336]
2025-08-07 11:16:21,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 20.0, 25.0, 22.0, 59.0, 22.0, 91.0, 94.0, 84.0, 75.0]
2025-08-07 11:16:21,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 48 seconds)
2025-08-07 11:18:17,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:18,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 309.11673 ± 130.479
2025-08-07 11:18:18,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [335.31393, 320.96936, 112.449684, 422.76672, 413.94217, 113.07201, 153.42859, 499.19675, 322.22754, 397.80063]
2025-08-07 11:18:18,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 60.0, 22.0, 80.0, 76.0, 22.0, 29.0, 92.0, 60.0, 76.0]
2025-08-07 11:18:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 10 minutes, 24 seconds)
2025-08-07 11:20:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:15,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 439.69809 ± 148.135
2025-08-07 11:20:15,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.805855, 479.72528, 398.42444, 531.08344, 571.9753, 534.9684, 669.1691, 383.91037, 365.02808, 359.89105]
2025-08-07 11:20:15,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 102.0, 74.0, 99.0, 105.0, 100.0, 126.0, 78.0, 68.0, 65.0]
2025-08-07 11:20:15,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (439.70) for latency MM1Queue_a033_s075
2025-08-07 11:20:15,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 24 seconds)
2025-08-07 11:22:11,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 270.60971 ± 178.041
2025-08-07 11:22:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [90.24276, 529.13025, 422.38223, 89.71578, 541.2693, 382.90317, 125.44852, 309.79422, 96.4432, 118.76779]
2025-08-07 11:22:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 97.0, 79.0, 18.0, 107.0, 72.0, 24.0, 60.0, 19.0, 23.0]
2025-08-07 11:22:12,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 30 seconds)
2025-08-07 11:24:09,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:10,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 289.94006 ± 140.715
2025-08-07 11:24:10,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [132.5762, 374.23105, 342.4616, 164.7044, 108.681946, 382.971, 399.2662, 103.08852, 519.16724, 372.2525]
2025-08-07 11:24:10,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 82.0, 62.0, 32.0, 21.0, 73.0, 73.0, 20.0, 93.0, 72.0]
2025-08-07 11:24:10,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes, 42 seconds)
2025-08-07 11:26:09,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 326.54559 ± 156.442
2025-08-07 11:26:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [105.151184, 435.36627, 545.6229, 338.12793, 507.97256, 247.62964, 120.30602, 145.03238, 469.9478, 350.2991]
2025-08-07 11:26:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 80.0, 99.0, 63.0, 94.0, 47.0, 23.0, 28.0, 100.0, 66.0]
2025-08-07 11:26:10,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 34 seconds)
2025-08-07 11:28:09,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:10,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 363.99902 ± 184.152
2025-08-07 11:28:10,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [134.73274, 596.7437, 513.638, 151.56364, 118.957504, 311.1362, 510.16425, 349.89185, 642.58435, 310.5781]
2025-08-07 11:28:10,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 129.0, 92.0, 29.0, 23.0, 62.0, 93.0, 69.0, 119.0, 57.0]
2025-08-07 11:28:10,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 19 seconds)
2025-08-07 11:30:05,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 381.48819 ± 153.068
2025-08-07 11:30:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [460.40854, 233.73096, 89.706345, 294.96567, 517.10596, 488.6481, 312.87967, 637.35956, 308.1379, 471.93924]
2025-08-07 11:30:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 46.0, 18.0, 56.0, 95.0, 92.0, 56.0, 117.0, 59.0, 88.0]
2025-08-07 11:30:06,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 8 seconds)
2025-08-07 11:32:02,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 361.05783 ± 262.352
2025-08-07 11:32:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.70246, 625.9721, 102.25456, 754.06464, 118.35891, 108.22159, 547.1788, 550.8928, 611.5046, 95.427734]
2025-08-07 11:32:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 125.0, 20.0, 142.0, 23.0, 21.0, 115.0, 101.0, 120.0, 19.0]
2025-08-07 11:32:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 12 seconds)
2025-08-07 11:33:59,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:00,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 327.55777 ± 151.730
2025-08-07 11:34:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.317116, 294.8772, 152.75305, 590.032, 394.19193, 125.00468, 484.98782, 347.4598, 388.61624, 395.33786]
2025-08-07 11:34:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 55.0, 29.0, 125.0, 85.0, 24.0, 89.0, 62.0, 82.0, 74.0]
2025-08-07 11:34:00,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 55 seconds)
2025-08-07 11:35:55,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:56,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 362.69440 ± 170.086
2025-08-07 11:35:56,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [519.46027, 571.7593, 534.37244, 387.10388, 431.91324, 124.57665, 133.75465, 103.03313, 460.70197, 360.26862]
2025-08-07 11:35:56,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 119.0, 101.0, 72.0, 78.0, 24.0, 26.0, 20.0, 85.0, 67.0]
2025-08-07 11:35:56,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 24 seconds)
2025-08-07 11:37:53,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:54,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 358.18930 ± 204.806
2025-08-07 11:37:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [298.58682, 124.00136, 441.62674, 522.37695, 181.86537, 682.9856, 496.5354, 141.62082, 595.3707, 96.92344]
2025-08-07 11:37:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 24.0, 83.0, 98.0, 35.0, 138.0, 91.0, 27.0, 112.0, 19.0]
2025-08-07 11:37:54,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 51 seconds)
2025-08-07 11:39:49,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:50,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 320.83234 ± 144.037
2025-08-07 11:39:50,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [439.09668, 359.20276, 377.72855, 349.61154, 136.7013, 332.62997, 409.98, 107.87481, 570.40515, 125.0923]
2025-08-07 11:39:50,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 68.0, 67.0, 64.0, 26.0, 72.0, 74.0, 21.0, 108.0, 24.0]
2025-08-07 11:39:50,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 6 seconds)
2025-08-07 11:41:46,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:47,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 349.64749 ± 162.752
2025-08-07 11:41:47,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [461.9325, 463.5281, 443.33646, 453.13638, 471.70294, 345.09332, 124.667404, 101.61003, 103.06086, 528.40704]
2025-08-07 11:41:47,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 94.0, 80.0, 82.0, 88.0, 65.0, 24.0, 20.0, 20.0, 98.0]
2025-08-07 11:41:47,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 6 seconds)
2025-08-07 11:43:43,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:44,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 409.38986 ± 213.136
2025-08-07 11:43:44,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [395.74396, 161.25282, 367.2017, 383.22318, 130.53407, 574.29095, 406.8339, 328.3766, 409.29715, 937.1446]
2025-08-07 11:43:44,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 31.0, 71.0, 72.0, 25.0, 113.0, 75.0, 60.0, 80.0, 200.0]
2025-08-07 11:43:44,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 15 seconds)
2025-08-07 11:45:40,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:41,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 355.57025 ± 188.996
2025-08-07 11:45:41,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [255.33578, 702.7156, 404.6852, 579.8801, 440.2266, 140.93553, 431.9942, 131.45377, 101.64312, 366.83267]
2025-08-07 11:45:41,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 132.0, 79.0, 107.0, 79.0, 27.0, 78.0, 25.0, 20.0, 68.0]
2025-08-07 11:45:41,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 43 minutes, 17 seconds)
2025-08-07 11:47:37,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:38,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 300.68491 ± 156.675
2025-08-07 11:47:38,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [444.10007, 459.1021, 371.18146, 545.85535, 322.66495, 114.37451, 376.70462, 119.9429, 112.709915, 140.2131]
2025-08-07 11:47:38,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 85.0, 77.0, 101.0, 63.0, 22.0, 71.0, 23.0, 22.0, 27.0]
2025-08-07 11:47:38,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 20 seconds)
2025-08-07 11:49:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 327.34302 ± 141.399
2025-08-07 11:49:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [303.87817, 401.35287, 108.68992, 141.58455, 454.10532, 425.47916, 113.793564, 413.98203, 460.20563, 450.3588]
2025-08-07 11:49:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 75.0, 21.0, 27.0, 82.0, 79.0, 22.0, 76.0, 89.0, 83.0]
2025-08-07 11:49:37,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 40 seconds)
2025-08-07 11:51:31,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:32,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 374.02023 ± 143.978
2025-08-07 11:51:32,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [558.8546, 389.29712, 320.68713, 527.1258, 410.5902, 114.6183, 140.34018, 400.10196, 523.49286, 355.09415]
2025-08-07 11:51:32,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 70.0, 58.0, 111.0, 89.0, 22.0, 27.0, 74.0, 96.0, 69.0]
2025-08-07 11:51:32,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 32 seconds)
2025-08-07 11:53:28,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:30,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 457.36334 ± 75.405
2025-08-07 11:53:30,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [585.60065, 439.9439, 434.61597, 555.7199, 419.10785, 524.4278, 380.97345, 351.00058, 385.07355, 497.1697]
2025-08-07 11:53:30,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 92.0, 80.0, 104.0, 88.0, 108.0, 71.0, 64.0, 80.0, 105.0]
2025-08-07 11:53:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (457.36) for latency MM1Queue_a033_s075
2025-08-07 11:53:30,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 35 seconds)
2025-08-07 11:55:25,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:26,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 396.82123 ± 156.894
2025-08-07 11:55:26,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [106.90959, 634.49054, 548.6328, 518.9888, 367.1908, 380.08688, 440.14246, 346.13394, 158.00864, 467.62827]
2025-08-07 11:55:26,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 124.0, 100.0, 94.0, 68.0, 78.0, 80.0, 64.0, 30.0, 103.0]
2025-08-07 11:55:26,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 39 seconds)
2025-08-07 11:57:23,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:23,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 302.26471 ± 143.262
2025-08-07 11:57:23,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.06306, 325.54764, 114.70115, 433.26364, 96.64685, 329.6447, 406.104, 340.0809, 531.10266, 349.49246]
2025-08-07 11:57:23,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 64.0, 22.0, 79.0, 19.0, 62.0, 76.0, 63.0, 98.0, 65.0]
2025-08-07 11:57:23,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 31 minutes, 39 seconds)
2025-08-07 11:59:19,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 223.42000 ± 175.140
2025-08-07 11:59:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [414.88467, 126.75677, 135.97847, 114.425156, 524.1802, 107.532104, 521.23865, 89.63935, 90.88538, 108.679184]
2025-08-07 11:59:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 24.0, 26.0, 22.0, 93.0, 21.0, 95.0, 18.0, 18.0, 21.0]
2025-08-07 11:59:20,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 29 minutes, 26 seconds)
2025-08-07 12:01:16,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:17,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 337.50574 ± 166.212
2025-08-07 12:01:17,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [505.05734, 262.00723, 456.86926, 118.59715, 89.874245, 438.83484, 577.6201, 371.2647, 131.29861, 423.63385]
2025-08-07 12:01:17,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 48.0, 85.0, 23.0, 18.0, 81.0, 106.0, 67.0, 25.0, 83.0]
2025-08-07 12:01:17,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 27 minutes, 39 seconds)
2025-08-07 12:03:13,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:14,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 271.69766 ± 172.155
2025-08-07 12:03:14,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [415.7214, 89.454216, 535.6212, 438.84943, 140.21205, 103.01764, 96.47149, 95.560265, 445.18045, 356.88855]
2025-08-07 12:03:14,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 18.0, 99.0, 80.0, 27.0, 20.0, 19.0, 19.0, 82.0, 81.0]
2025-08-07 12:03:14,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 40 seconds)
2025-08-07 12:05:10,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:11,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 428.83368 ± 201.462
2025-08-07 12:05:11,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [144.72604, 292.9265, 585.0013, 735.27216, 355.0148, 387.02856, 503.0876, 504.50894, 101.42589, 679.34546]
2025-08-07 12:05:11,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 58.0, 109.0, 138.0, 66.0, 69.0, 91.0, 91.0, 20.0, 128.0]
2025-08-07 12:05:11,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 23 minutes, 48 seconds)
2025-08-07 12:07:08,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 324.24286 ± 123.971
2025-08-07 12:07:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [390.76996, 404.0925, 476.16364, 368.76016, 435.27707, 156.5982, 102.53211, 373.3965, 169.02515, 365.81335]
2025-08-07 12:07:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 75.0, 94.0, 66.0, 80.0, 30.0, 20.0, 71.0, 32.0, 70.0]
2025-08-07 12:07:09,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes, 57 seconds)
2025-08-07 12:09:04,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:05,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 414.55508 ± 157.388
2025-08-07 12:09:05,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [361.87933, 386.7778, 421.0199, 530.0698, 555.6176, 699.11444, 476.2622, 96.89326, 346.9627, 270.95404]
2025-08-07 12:09:05,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 83.0, 78.0, 117.0, 101.0, 123.0, 90.0, 19.0, 62.0, 53.0]
2025-08-07 12:09:05,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 1 second)
2025-08-07 12:11:01,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:02,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 358.22052 ± 200.214
2025-08-07 12:11:02,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [460.47137, 523.08997, 96.331604, 482.81985, 624.82666, 102.63957, 583.80524, 134.51898, 412.5307, 161.17123]
2025-08-07 12:11:02,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 114.0, 19.0, 101.0, 121.0, 20.0, 109.0, 26.0, 83.0, 31.0]
2025-08-07 12:11:02,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 2 seconds)
2025-08-07 12:12:58,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:59,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 410.44003 ± 214.425
2025-08-07 12:12:59,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [676.4037, 499.61548, 102.998276, 534.61926, 605.0892, 113.056984, 655.29456, 119.92626, 408.70206, 388.69476]
2025-08-07 12:12:59,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 90.0, 20.0, 97.0, 112.0, 22.0, 118.0, 23.0, 74.0, 71.0]
2025-08-07 12:12:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 7 seconds)
2025-08-07 12:14:56,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:56,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 333.28925 ± 191.922
2025-08-07 12:14:56,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [424.66895, 514.9889, 96.07413, 90.01399, 307.3559, 113.52424, 730.7196, 303.40305, 371.20285, 380.94077]
2025-08-07 12:14:56,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 96.0, 19.0, 18.0, 67.0, 22.0, 131.0, 66.0, 66.0, 69.0]
2025-08-07 12:14:56,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 8 seconds)
2025-08-07 12:16:53,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:54,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 377.28000 ± 152.559
2025-08-07 12:16:54,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [492.14734, 344.74707, 396.85196, 405.07776, 379.29004, 113.698296, 416.98642, 491.7669, 623.91565, 108.31856]
2025-08-07 12:16:54,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 64.0, 77.0, 73.0, 69.0, 22.0, 75.0, 89.0, 114.0, 21.0]
2025-08-07 12:16:54,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 12 seconds)
2025-08-07 12:18:50,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:51,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 362.26132 ± 279.721
2025-08-07 12:18:51,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [108.36073, 125.25415, 493.5862, 139.67418, 823.8428, 101.15223, 394.27795, 838.72894, 101.77854, 495.95755]
2025-08-07 12:18:51,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 105.0, 27.0, 160.0, 20.0, 72.0, 159.0, 20.0, 91.0]
2025-08-07 12:18:51,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 17 seconds)
2025-08-07 12:20:47,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:49,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 515.06860 ± 114.911
2025-08-07 12:20:49,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [390.6096, 333.3703, 473.88834, 546.7266, 532.5097, 580.2605, 587.0683, 486.70172, 447.9721, 771.5784]
2025-08-07 12:20:49,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 64.0, 85.0, 114.0, 99.0, 108.0, 106.0, 86.0, 79.0, 145.0]
2025-08-07 12:20:49,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (515.07) for latency MM1Queue_a033_s075
2025-08-07 12:20:49,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 27 seconds)
2025-08-07 12:22:43,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:45,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 439.56454 ± 203.991
2025-08-07 12:22:45,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [716.2834, 577.9731, 448.91425, 410.1617, 395.19287, 457.4918, 774.76294, 364.17435, 96.69817, 153.9929]
2025-08-07 12:22:45,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 117.0, 86.0, 78.0, 72.0, 83.0, 144.0, 66.0, 19.0, 30.0]
2025-08-07 12:22:45,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 19 seconds)
2025-08-07 12:24:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:39,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 286.87408 ± 241.211
2025-08-07 12:24:39,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [872.1278, 553.8081, 137.54941, 344.9906, 135.17746, 91.15004, 112.73295, 153.2922, 350.75964, 117.15227]
2025-08-07 12:24:39,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 98.0, 26.0, 63.0, 26.0, 18.0, 22.0, 29.0, 68.0, 23.0]
2025-08-07 12:24:39,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 7 seconds)
2025-08-07 12:26:34,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:35,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 436.38477 ± 95.077
2025-08-07 12:26:35,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [430.37933, 415.27567, 337.20898, 519.6512, 526.2027, 437.73523, 342.72995, 401.34695, 637.64703, 315.6708]
2025-08-07 12:26:35,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 74.0, 63.0, 99.0, 94.0, 79.0, 63.0, 76.0, 118.0, 58.0]
2025-08-07 12:26:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 59 seconds)
2025-08-07 12:28:30,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:31,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 377.08865 ± 197.694
2025-08-07 12:28:31,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [531.4983, 119.70873, 632.9442, 420.09454, 441.89166, 96.027626, 117.53931, 290.77414, 624.84015, 495.56778]
2025-08-07 12:28:31,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 23.0, 115.0, 76.0, 84.0, 19.0, 23.0, 55.0, 117.0, 89.0]
2025-08-07 12:28:31,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 56 seconds)
2025-08-07 12:30:25,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:26,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 520.44824 ± 156.148
2025-08-07 12:30:26,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [543.2306, 642.98883, 665.93744, 607.13727, 333.22464, 147.07248, 467.70267, 572.78174, 574.90283, 649.50415]
2025-08-07 12:30:26,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 117.0, 120.0, 110.0, 61.0, 28.0, 90.0, 107.0, 107.0, 122.0]
2025-08-07 12:30:26,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (520.45) for latency MM1Queue_a033_s075
2025-08-07 12:30:26,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 44 seconds)
2025-08-07 12:32:20,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:21,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 313.49115 ± 204.757
2025-08-07 12:32:21,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [142.09552, 517.899, 398.6451, 163.33865, 119.49165, 519.88434, 366.32095, 107.735374, 698.1281, 101.37275]
2025-08-07 12:32:21,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 94.0, 73.0, 31.0, 23.0, 104.0, 68.0, 21.0, 135.0, 20.0]
2025-08-07 12:32:21,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 40 seconds)
2025-08-07 12:34:11,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:12,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 441.25909 ± 137.844
2025-08-07 12:34:12,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [439.81808, 662.98663, 350.51816, 416.32968, 474.32294, 108.630066, 531.16547, 539.37665, 480.2246, 409.21848]
2025-08-07 12:34:12,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 123.0, 65.0, 72.0, 87.0, 21.0, 97.0, 98.0, 87.0, 73.0]
2025-08-07 12:34:12,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 27 seconds)
2025-08-07 12:36:03,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:04,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 294.54535 ± 188.846
2025-08-07 12:36:04,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [398.1191, 451.9696, 100.96856, 442.28946, 137.39963, 640.54083, 117.323685, 114.85697, 108.81379, 433.17212]
2025-08-07 12:36:04,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 82.0, 20.0, 95.0, 26.0, 121.0, 23.0, 22.0, 21.0, 76.0]
2025-08-07 12:36:04,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 12 seconds)
2025-08-07 12:38:02,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:04,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 430.70825 ± 183.423
2025-08-07 12:38:04,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [401.87863, 552.06946, 733.2874, 505.5278, 559.4993, 511.8152, 370.8951, 108.457, 125.60935, 438.04346]
2025-08-07 12:38:04,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 111.0, 142.0, 93.0, 103.0, 94.0, 69.0, 21.0, 24.0, 83.0]
2025-08-07 12:38:04,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 36 seconds)
2025-08-07 12:40:30,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:31,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 314.88699 ± 177.889
2025-08-07 12:40:31,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [97.12748, 435.37183, 124.895805, 124.691605, 130.4209, 354.66165, 646.3157, 436.85175, 461.18164, 337.35153]
2025-08-07 12:40:31,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 84.0, 24.0, 24.0, 25.0, 65.0, 119.0, 84.0, 87.0, 62.0]
2025-08-07 12:40:31,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 23 seconds)
2025-08-07 12:43:00,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:43:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 504.53110 ± 286.659
2025-08-07 12:43:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [146.0673, 892.76526, 657.8112, 529.47626, 872.9191, 102.31573, 681.0493, 430.41638, 636.4401, 96.05007]
2025-08-07 12:43:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 169.0, 122.0, 96.0, 183.0, 20.0, 126.0, 78.0, 131.0, 19.0]
2025-08-07 12:43:02,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 51 minutes, 18 seconds)
2025-08-07 12:45:31,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:45:32,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 394.79462 ± 148.818
2025-08-07 12:45:32,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [563.29297, 103.24126, 490.62897, 419.67822, 114.12717, 492.87924, 414.52982, 465.59274, 444.9636, 439.0124]
2025-08-07 12:45:32,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 20.0, 89.0, 73.0, 22.0, 89.0, 75.0, 87.0, 81.0, 79.0]
2025-08-07 12:45:32,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 52 minutes, 8 seconds)
2025-08-07 12:48:00,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:48:02,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 449.69223 ± 222.681
2025-08-07 12:48:02,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [562.6947, 314.9787, 731.33673, 125.4361, 471.01105, 713.68854, 735.8145, 414.6811, 124.75005, 302.53082]
2025-08-07 12:48:02,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 57.0, 131.0, 24.0, 84.0, 135.0, 137.0, 76.0, 24.0, 55.0]
2025-08-07 12:48:02,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 52 minutes, 37 seconds)
2025-08-07 12:50:31,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:50:33,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 359.71533 ± 184.351
2025-08-07 12:50:33,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [355.0105, 108.581024, 497.02884, 372.0209, 659.72974, 501.40164, 102.777115, 523.4649, 125.68091, 351.45767]
2025-08-07 12:50:33,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 21.0, 91.0, 70.0, 120.0, 90.0, 20.0, 98.0, 24.0, 64.0]
2025-08-07 12:50:33,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 52 minutes, 25 seconds)
2025-08-07 12:53:01,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:53:03,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 333.14685 ± 195.988
2025-08-07 12:53:03,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [95.86035, 451.5496, 108.849205, 120.03444, 123.919304, 417.11774, 443.4481, 497.47992, 386.15, 687.05975]
2025-08-07 12:53:03,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 83.0, 21.0, 23.0, 24.0, 76.0, 92.0, 91.0, 70.0, 124.0]
2025-08-07 12:53:03,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 50 minutes, 6 seconds)
2025-08-07 12:55:31,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:55:33,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 370.21069 ± 184.422
2025-08-07 12:55:33,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [454.81473, 365.89957, 135.1547, 476.86398, 102.04716, 620.12634, 90.37222, 469.73694, 411.98087, 575.1104]
2025-08-07 12:55:33,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 65.0, 26.0, 85.0, 20.0, 111.0, 18.0, 83.0, 81.0, 100.0]
2025-08-07 12:55:33,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 47 minutes, 33 seconds)
2025-08-07 12:58:02,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:58:03,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 402.32397 ± 215.432
2025-08-07 12:58:03,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [591.95874, 564.356, 119.46072, 108.910286, 720.02606, 136.94612, 373.0296, 579.09454, 541.25354, 288.20422]
2025-08-07 12:58:03,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 121.0, 23.0, 21.0, 142.0, 26.0, 70.0, 123.0, 115.0, 53.0]
2025-08-07 12:58:03,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 45 minutes, 2 seconds)
2025-08-07 13:00:33,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:00:35,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 449.92929 ± 282.997
2025-08-07 13:00:35,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [388.4497, 114.880585, 666.7974, 587.9495, 451.68127, 448.59348, 108.58777, 525.1853, 129.58469, 1077.5835]
2025-08-07 13:00:35,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 22.0, 138.0, 110.0, 82.0, 79.0, 21.0, 94.0, 25.0, 211.0]
2025-08-07 13:00:35,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 42 minutes, 40 seconds)
2025-08-07 13:03:06,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:03:07,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 247.61934 ± 163.131
2025-08-07 13:03:07,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [105.677666, 109.44866, 360.6362, 562.07446, 450.07455, 140.97517, 107.98047, 183.4107, 89.26989, 366.64554]
2025-08-07 13:03:07,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 72.0, 104.0, 80.0, 27.0, 21.0, 35.0, 18.0, 78.0]
2025-08-07 13:03:07,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 40 minutes, 13 seconds)
2025-08-07 13:05:36,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:05:37,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 339.77332 ± 225.128
2025-08-07 13:05:37,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.65448, 545.0286, 710.88995, 152.10637, 102.254425, 123.24213, 124.783165, 444.4698, 585.0949, 490.20917]
2025-08-07 13:05:37,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 114.0, 125.0, 29.0, 20.0, 24.0, 24.0, 81.0, 104.0, 105.0]
2025-08-07 13:05:37,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 37 minutes, 43 seconds)
2025-08-07 13:08:05,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:08:06,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 357.17169 ± 180.220
2025-08-07 13:08:06,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [300.69376, 427.84662, 101.808624, 569.3599, 457.83017, 526.017, 559.0546, 412.43298, 101.857666, 114.81553]
2025-08-07 13:08:06,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 77.0, 20.0, 105.0, 83.0, 96.0, 102.0, 76.0, 20.0, 22.0]
2025-08-07 13:08:06,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 35 minutes, 10 seconds)
2025-08-07 13:10:36,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:10:37,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 457.08896 ± 264.119
2025-08-07 13:10:37,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [570.2426, 168.18892, 576.7358, 397.61954, 403.84128, 565.2555, 959.2432, 96.07483, 719.505, 114.18311]
2025-08-07 13:10:37,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 32.0, 113.0, 73.0, 75.0, 119.0, 181.0, 19.0, 130.0, 22.0]
2025-08-07 13:10:37,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 32 minutes, 41 seconds)
2025-08-07 13:13:09,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:11,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 471.94980 ± 202.124
2025-08-07 13:13:11,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [609.7221, 407.45523, 589.8635, 605.4695, 140.41795, 102.78274, 388.11517, 563.691, 766.1859, 545.7949]
2025-08-07 13:13:11,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 73.0, 110.0, 113.0, 27.0, 20.0, 75.0, 99.0, 137.0, 114.0]
2025-08-07 13:13:11,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 30 minutes, 14 seconds)
2025-08-07 13:15:40,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:15:41,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 393.77643 ± 282.198
2025-08-07 13:15:41,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [108.058876, 625.9234, 102.13773, 614.1849, 378.5284, 124.84785, 95.129524, 464.50293, 992.96387, 431.487]
2025-08-07 13:15:41,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 118.0, 20.0, 132.0, 71.0, 24.0, 19.0, 84.0, 183.0, 89.0]
2025-08-07 13:15:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 27 minutes, 39 seconds)
2025-08-07 13:18:11,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:18:12,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 388.23950 ± 248.964
2025-08-07 13:18:12,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [577.0522, 89.08897, 798.3719, 123.799644, 658.4359, 108.64733, 554.61035, 464.29086, 388.01895, 120.07857]
2025-08-07 13:18:12,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 18.0, 160.0, 24.0, 121.0, 21.0, 99.0, 100.0, 71.0, 23.0]
2025-08-07 13:18:12,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 25 minutes, 9 seconds)
2025-08-07 13:20:42,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:20:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 455.51099 ± 186.780
2025-08-07 13:20:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [545.16797, 580.85754, 418.198, 596.60956, 124.51391, 472.77533, 549.49506, 108.18238, 709.39343, 449.91647]
2025-08-07 13:20:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 115.0, 77.0, 107.0, 24.0, 83.0, 99.0, 21.0, 130.0, 85.0]
2025-08-07 13:20:43,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 22 minutes, 42 seconds)
2025-08-07 13:23:14,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:23:15,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 446.04575 ± 188.554
2025-08-07 13:23:15,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [439.41483, 634.73895, 382.5417, 519.01685, 131.3991, 421.9327, 546.601, 96.37436, 646.34766, 642.0903]
2025-08-07 13:23:15,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 129.0, 69.0, 94.0, 25.0, 74.0, 116.0, 19.0, 129.0, 118.0]
2025-08-07 13:23:15,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 12 seconds)
2025-08-07 13:25:46,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:25:48,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 412.95581 ± 173.755
2025-08-07 13:25:48,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [478.1075, 558.7311, 566.8863, 310.0464, 371.99796, 487.09558, 122.624306, 535.95044, 96.70142, 601.4174]
2025-08-07 13:25:48,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 101.0, 103.0, 57.0, 68.0, 93.0, 24.0, 97.0, 19.0, 111.0]
2025-08-07 13:25:48,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 17 minutes, 39 seconds)
2025-08-07 13:28:17,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:28:19,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 515.74054 ± 166.149
2025-08-07 13:28:19,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [449.74408, 611.501, 123.37225, 575.82513, 452.87503, 535.1508, 539.2737, 506.28287, 531.82904, 831.5514]
2025-08-07 13:28:19,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 111.0, 24.0, 107.0, 84.0, 110.0, 99.0, 91.0, 113.0, 152.0]
2025-08-07 13:28:19,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 9 seconds)
2025-08-07 13:30:49,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:30:50,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 284.96570 ± 250.957
2025-08-07 13:30:50,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [90.11825, 913.3011, 136.71541, 453.24197, 321.61545, 113.876, 106.30879, 465.64426, 124.91624, 123.91953]
2025-08-07 13:30:50,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 173.0, 26.0, 79.0, 58.0, 22.0, 21.0, 97.0, 24.0, 24.0]
2025-08-07 13:30:50,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 37 seconds)
2025-08-07 13:33:20,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:33:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 436.65128 ± 267.602
2025-08-07 13:33:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [469.77625, 815.8117, 106.529434, 690.07874, 107.7327, 638.2326, 410.947, 90.59585, 278.1555, 758.65326]
2025-08-07 13:33:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 160.0, 21.0, 124.0, 21.0, 134.0, 78.0, 18.0, 52.0, 140.0]
2025-08-07 13:33:22,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 6 seconds)
2025-08-07 13:35:50,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:35:51,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 554.66107 ± 205.972
2025-08-07 13:35:51,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [501.25085, 696.39246, 122.95458, 560.6894, 486.73816, 319.1059, 734.88995, 703.0262, 550.2254, 871.3374]
2025-08-07 13:35:51,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 134.0, 24.0, 102.0, 90.0, 61.0, 133.0, 130.0, 99.0, 165.0]
2025-08-07 13:35:51,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (554.66) for latency MM1Queue_a033_s075
2025-08-07 13:35:51,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 33 seconds)
2025-08-07 13:38:22,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:38:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 406.67352 ± 276.919
2025-08-07 13:38:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [975.2338, 424.79132, 489.1027, 118.73863, 680.98047, 118.91615, 528.06366, 108.18938, 119.64968, 503.06937]
2025-08-07 13:38:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 82.0, 98.0, 23.0, 133.0, 23.0, 117.0, 21.0, 23.0, 93.0]
2025-08-07 13:38:24,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 2 seconds)
2025-08-07 13:40:54,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:40:56,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 511.24127 ± 290.973
2025-08-07 13:40:56,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [322.73392, 833.66315, 882.993, 553.58856, 346.64688, 124.183586, 347.7454, 885.73785, 89.92564, 725.19476]
2025-08-07 13:40:56,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 150.0, 168.0, 99.0, 64.0, 24.0, 65.0, 161.0, 18.0, 136.0]
2025-08-07 13:40:56,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 31 seconds)
2025-08-07 13:43:25,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:43:26,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 413.32526 ± 271.033
2025-08-07 13:43:26,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [108.13463, 508.2999, 621.417, 1054.3287, 103.0168, 389.33136, 392.82892, 426.5225, 129.42996, 399.94257]
2025-08-07 13:43:26,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 90.0, 117.0, 195.0, 20.0, 72.0, 73.0, 94.0, 25.0, 88.0]
2025-08-07 13:43:26,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1251 [DEBUG]: Training session finished
