2025-08-07 10:10:48,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:10:48,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:10:48,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14c2051cefd0>}
2025-08-07 10:10:48,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 10:10:48,934 baseline-bpql-noiseperc0-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:10:48,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 10:10:48,952 baseline-bpql-noiseperc0-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 10:10:48,953 baseline-bpql-noiseperc0-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:10:50,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 10:10:50,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 10:12:40,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:41,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 297.06024 ± 29.174
2025-08-07 10:12:41,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [304.2413, 295.97696, 354.39374, 327.96152, 282.67868, 308.87228, 281.67953, 255.24083, 305.1481, 254.40912]
2025-08-07 10:12:41,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 55.0, 66.0, 61.0, 53.0, 58.0, 53.0, 48.0, 57.0, 48.0]
2025-08-07 10:12:41,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (297.06) for latency MM1Queue_a033_s075
2025-08-07 10:12:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 2 minutes, 52 seconds)
2025-08-07 10:14:38,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 429.17007 ± 43.267
2025-08-07 10:14:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [463.05746, 431.4118, 474.4681, 379.51727, 460.61206, 349.4889, 405.8583, 431.29617, 495.19403, 400.79645]
2025-08-07 10:14:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 89.0, 97.0, 81.0, 97.0, 75.0, 79.0, 93.0, 96.0, 87.0]
2025-08-07 10:14:39,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (429.17) for latency MM1Queue_a033_s075
2025-08-07 10:14:39,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 7 minutes, 8 seconds)
2025-08-07 10:16:37,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 462.04599 ± 152.410
2025-08-07 10:16:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [368.0491, 518.65045, 373.82712, 440.72705, 378.3851, 379.535, 538.3032, 879.6109, 342.55118, 400.82056]
2025-08-07 10:16:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 102.0, 72.0, 82.0, 75.0, 72.0, 105.0, 187.0, 66.0, 90.0]
2025-08-07 10:16:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (462.05) for latency MM1Queue_a033_s075
2025-08-07 10:16:39,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 7 minutes, 46 seconds)
2025-08-07 10:18:37,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:38,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 508.07684 ± 115.124
2025-08-07 10:18:38,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [562.77997, 668.188, 705.0754, 415.13464, 524.6508, 376.6256, 353.48972, 397.47357, 559.9393, 517.4115]
2025-08-07 10:18:38,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 138.0, 143.0, 79.0, 116.0, 83.0, 78.0, 88.0, 123.0, 110.0]
2025-08-07 10:18:38,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (508.08) for latency MM1Queue_a033_s075
2025-08-07 10:18:38,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 7 minutes, 11 seconds)
2025-08-07 10:20:36,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:37,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 439.17007 ± 38.514
2025-08-07 10:20:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [453.38538, 450.24963, 427.1906, 430.80453, 349.72614, 483.3376, 453.3245, 418.91916, 425.9396, 498.8233]
2025-08-07 10:20:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 92.0, 90.0, 82.0, 67.0, 103.0, 87.0, 80.0, 80.0, 94.0]
2025-08-07 10:20:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 5 minutes, 46 seconds)
2025-08-07 10:22:35,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 450.41104 ± 61.023
2025-08-07 10:22:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [434.22754, 481.15427, 446.74924, 359.90137, 473.58902, 538.7051, 551.17633, 436.57474, 358.67105, 423.36148]
2025-08-07 10:22:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 91.0, 84.0, 75.0, 96.0, 100.0, 115.0, 90.0, 74.0, 83.0]
2025-08-07 10:22:36,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 6 minutes, 30 seconds)
2025-08-07 10:24:33,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:35,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 517.29510 ± 125.697
2025-08-07 10:24:35,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [331.74796, 439.09335, 581.21735, 555.2945, 516.19946, 460.29858, 540.9512, 389.86386, 819.6532, 538.6316]
2025-08-07 10:24:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 86.0, 117.0, 123.0, 114.0, 85.0, 116.0, 84.0, 162.0, 116.0]
2025-08-07 10:24:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (517.30) for latency MM1Queue_a033_s075
2025-08-07 10:24:35,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 4 minutes, 33 seconds)
2025-08-07 10:26:32,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:33,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 537.13513 ± 92.661
2025-08-07 10:26:33,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [564.0806, 467.8676, 347.47604, 608.2062, 510.4337, 430.23203, 583.7522, 633.6811, 658.77795, 566.8437]
2025-08-07 10:26:33,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 90.0, 67.0, 115.0, 100.0, 81.0, 125.0, 122.0, 129.0, 116.0]
2025-08-07 10:26:33,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (537.14) for latency MM1Queue_a033_s075
2025-08-07 10:26:33,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 2 minutes, 24 seconds)
2025-08-07 10:28:32,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:33,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 517.09650 ± 123.064
2025-08-07 10:28:33,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [472.874, 685.8301, 492.57324, 439.86066, 735.4175, 388.9903, 396.00546, 673.3176, 429.92944, 456.16577]
2025-08-07 10:28:33,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 135.0, 94.0, 85.0, 139.0, 76.0, 75.0, 139.0, 96.0, 87.0]
2025-08-07 10:28:33,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 29 seconds)
2025-08-07 10:30:31,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:32,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 452.24591 ± 50.378
2025-08-07 10:30:32,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [449.28555, 506.3986, 459.14014, 439.42554, 482.7361, 330.55566, 409.12006, 445.71982, 502.99274, 497.08505]
2025-08-07 10:30:32,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 96.0, 93.0, 95.0, 92.0, 73.0, 78.0, 99.0, 97.0, 105.0]
2025-08-07 10:30:32,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 35 seconds)
2025-08-07 10:32:30,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:32,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 563.15118 ± 93.577
2025-08-07 10:32:32,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [688.9015, 528.6778, 490.83426, 720.55664, 601.02075, 590.2297, 497.17233, 497.84692, 399.2337, 617.03735]
2025-08-07 10:32:32,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 112.0, 103.0, 134.0, 117.0, 112.0, 99.0, 95.0, 84.0, 119.0]
2025-08-07 10:32:32,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (563.15) for latency MM1Queue_a033_s075
2025-08-07 10:32:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 37 seconds)
2025-08-07 10:34:29,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 653.44861 ± 114.507
2025-08-07 10:34:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [674.29926, 425.5039, 744.2466, 718.6643, 673.51733, 693.95355, 442.1595, 769.82886, 730.4342, 661.87897]
2025-08-07 10:34:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 79.0, 141.0, 150.0, 131.0, 132.0, 98.0, 146.0, 138.0, 125.0]
2025-08-07 10:34:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (653.45) for latency MM1Queue_a033_s075
2025-08-07 10:34:31,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 55 minutes, 2 seconds)
2025-08-07 10:36:29,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:30,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 580.24231 ± 138.008
2025-08-07 10:36:30,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [861.9668, 715.9643, 577.83624, 690.83875, 516.5813, 514.30945, 348.46024, 601.5812, 469.6549, 505.2306]
2025-08-07 10:36:30,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 148.0, 119.0, 143.0, 97.0, 97.0, 67.0, 111.0, 88.0, 101.0]
2025-08-07 10:36:30,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 5 seconds)
2025-08-07 10:38:29,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:30,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 606.45905 ± 80.263
2025-08-07 10:38:30,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [558.3911, 790.34265, 581.57623, 592.29736, 582.7159, 489.359, 631.62036, 537.9027, 695.1538, 605.23096]
2025-08-07 10:38:30,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 146.0, 108.0, 110.0, 116.0, 91.0, 120.0, 101.0, 141.0, 114.0]
2025-08-07 10:38:30,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 12 seconds)
2025-08-07 10:40:29,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:30,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 563.41772 ± 78.289
2025-08-07 10:40:30,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [532.07465, 503.51968, 544.34686, 654.8454, 598.6128, 431.68582, 717.2988, 598.9142, 555.86646, 497.0121]
2025-08-07 10:40:30,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 97.0, 101.0, 128.0, 111.0, 83.0, 131.0, 117.0, 106.0, 93.0]
2025-08-07 10:40:30,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 28 seconds)
2025-08-07 10:42:28,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:30,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 484.32138 ± 75.554
2025-08-07 10:42:30,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [484.99274, 595.87836, 436.3545, 378.10513, 530.85205, 441.14713, 398.14227, 442.36908, 612.0775, 523.2948]
2025-08-07 10:42:30,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 111.0, 83.0, 80.0, 102.0, 83.0, 78.0, 95.0, 137.0, 96.0]
2025-08-07 10:42:30,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 30 seconds)
2025-08-07 10:44:28,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:30,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 690.65137 ± 140.575
2025-08-07 10:44:30,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [692.0092, 1015.62854, 721.82916, 728.1592, 601.8619, 605.68146, 481.73703, 792.49774, 717.166, 549.94354]
2025-08-07 10:44:30,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 204.0, 140.0, 136.0, 111.0, 114.0, 100.0, 158.0, 135.0, 111.0]
2025-08-07 10:44:30,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (690.65) for latency MM1Queue_a033_s075
2025-08-07 10:44:30,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 37 seconds)
2025-08-07 10:46:28,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:30,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 644.04218 ± 152.075
2025-08-07 10:46:30,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [885.38306, 527.8304, 724.2256, 551.63983, 722.6634, 511.71857, 506.35867, 465.98636, 635.7518, 908.86444]
2025-08-07 10:46:30,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 106.0, 152.0, 108.0, 141.0, 111.0, 111.0, 102.0, 121.0, 192.0]
2025-08-07 10:46:30,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 59 seconds)
2025-08-07 10:48:27,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:29,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 615.24756 ± 168.911
2025-08-07 10:48:29,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [440.35162, 462.98306, 622.23004, 531.2858, 618.1465, 623.7849, 644.05133, 495.11447, 1071.6831, 642.84467]
2025-08-07 10:48:29,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 87.0, 119.0, 99.0, 129.0, 122.0, 133.0, 93.0, 229.0, 129.0]
2025-08-07 10:48:29,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 42 seconds)
2025-08-07 10:50:27,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:29,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 592.30933 ± 108.382
2025-08-07 10:50:29,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [484.64716, 490.13483, 575.2225, 611.5184, 740.4841, 510.55603, 827.9586, 489.64246, 592.5703, 600.3588]
2025-08-07 10:50:29,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 91.0, 127.0, 111.0, 139.0, 113.0, 173.0, 102.0, 107.0, 133.0]
2025-08-07 10:50:29,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 34 seconds)
2025-08-07 10:52:27,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:29,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 669.75134 ± 123.614
2025-08-07 10:52:29,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [820.82166, 599.0587, 626.6834, 734.45123, 944.1205, 592.4385, 539.2014, 538.4448, 614.33545, 687.95807]
2025-08-07 10:52:29,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 118.0, 138.0, 141.0, 186.0, 130.0, 101.0, 101.0, 117.0, 151.0]
2025-08-07 10:52:29,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 49 seconds)
2025-08-07 10:54:27,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:29,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 632.28564 ± 107.682
2025-08-07 10:54:29,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [627.08624, 605.85315, 616.4846, 530.2147, 612.8999, 755.5696, 600.15875, 717.06915, 829.6835, 427.8371]
2025-08-07 10:54:29,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 129.0, 133.0, 115.0, 116.0, 141.0, 111.0, 137.0, 159.0, 79.0]
2025-08-07 10:54:29,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 39 seconds)
2025-08-07 10:56:27,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 706.04114 ± 208.355
2025-08-07 10:56:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [647.2916, 764.98895, 454.3005, 467.17813, 684.48834, 743.23425, 602.55756, 1233.8636, 815.92926, 646.5786]
2025-08-07 10:56:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 143.0, 88.0, 100.0, 148.0, 156.0, 113.0, 255.0, 160.0, 120.0]
2025-08-07 10:56:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (706.04) for latency MM1Queue_a033_s075
2025-08-07 10:56:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 47 seconds)
2025-08-07 10:58:28,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:30,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 592.04651 ± 128.724
2025-08-07 10:58:30,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [532.5888, 532.7845, 888.4333, 727.64923, 455.76016, 473.04633, 656.5035, 616.344, 472.99866, 564.357]
2025-08-07 10:58:30,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 101.0, 180.0, 138.0, 97.0, 104.0, 123.0, 118.0, 105.0, 108.0]
2025-08-07 10:58:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 14 seconds)
2025-08-07 11:00:29,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:31,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 660.14832 ± 161.841
2025-08-07 11:00:31,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [509.88727, 956.1425, 583.222, 495.52863, 610.7857, 560.4479, 752.414, 909.7283, 487.49716, 735.82947]
2025-08-07 11:00:31,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 183.0, 118.0, 104.0, 132.0, 122.0, 140.0, 175.0, 106.0, 146.0]
2025-08-07 11:00:31,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 33 seconds)
2025-08-07 11:02:28,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:30,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 657.23645 ± 71.980
2025-08-07 11:02:30,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [722.7701, 708.96094, 573.674, 741.4171, 667.835, 580.81354, 751.559, 533.50885, 632.55536, 659.27124]
2025-08-07 11:02:30,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 131.0, 114.0, 150.0, 136.0, 109.0, 145.0, 104.0, 127.0, 125.0]
2025-08-07 11:02:30,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 8 seconds)
2025-08-07 11:04:28,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:30,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 707.85760 ± 133.982
2025-08-07 11:04:30,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [628.98553, 849.5636, 698.03955, 655.85315, 521.87476, 837.46124, 733.6186, 965.474, 648.62537, 539.0803]
2025-08-07 11:04:30,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 163.0, 151.0, 125.0, 107.0, 160.0, 136.0, 183.0, 117.0, 119.0]
2025-08-07 11:04:30,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (707.86) for latency MM1Queue_a033_s075
2025-08-07 11:04:30,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 14 seconds)
2025-08-07 11:06:28,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:31,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 783.56860 ± 169.817
2025-08-07 11:06:31,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [858.4919, 868.88544, 519.5697, 760.5485, 652.09644, 1014.5205, 608.069, 625.35095, 1046.6525, 881.50165]
2025-08-07 11:06:31,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 168.0, 95.0, 165.0, 129.0, 188.0, 122.0, 121.0, 198.0, 182.0]
2025-08-07 11:06:31,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (783.57) for latency MM1Queue_a033_s075
2025-08-07 11:06:31,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 18 seconds)
2025-08-07 11:08:30,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:32,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 665.35828 ± 132.493
2025-08-07 11:08:32,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1010.0696, 709.19525, 542.92957, 565.01733, 698.9809, 653.082, 735.6972, 562.582, 565.82605, 610.20306]
2025-08-07 11:08:32,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 135.0, 112.0, 113.0, 148.0, 133.0, 139.0, 109.0, 108.0, 117.0]
2025-08-07 11:08:32,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 24 seconds)
2025-08-07 11:10:31,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 740.61560 ± 85.935
2025-08-07 11:10:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [756.4063, 616.06195, 672.7106, 667.45935, 754.6013, 892.3068, 689.8475, 887.34827, 715.49274, 753.9213]
2025-08-07 11:10:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 118.0, 142.0, 142.0, 162.0, 164.0, 131.0, 169.0, 135.0, 147.0]
2025-08-07 11:10:33,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 29 seconds)
2025-08-07 11:12:31,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 625.61041 ± 132.417
2025-08-07 11:12:33,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [514.0111, 921.3377, 517.9754, 617.1176, 717.0784, 788.5662, 505.93225, 568.7917, 570.7564, 534.537]
2025-08-07 11:12:33,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 186.0, 108.0, 127.0, 134.0, 167.0, 109.0, 124.0, 105.0, 113.0]
2025-08-07 11:12:33,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 48 seconds)
2025-08-07 11:14:31,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:34,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 742.59827 ± 72.612
2025-08-07 11:14:34,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [824.3772, 781.14374, 706.1745, 853.8818, 747.031, 654.85675, 778.37317, 646.70337, 791.912, 641.52905]
2025-08-07 11:14:34,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 159.0, 135.0, 167.0, 139.0, 123.0, 151.0, 138.0, 154.0, 135.0]
2025-08-07 11:14:34,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 53 seconds)
2025-08-07 11:16:33,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:35,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 755.42212 ± 198.750
2025-08-07 11:16:35,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [584.39276, 1246.3312, 847.96045, 572.90845, 903.3515, 534.29, 715.35846, 679.947, 779.36975, 690.31195]
2025-08-07 11:16:35,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 248.0, 161.0, 127.0, 175.0, 99.0, 136.0, 129.0, 145.0, 138.0]
2025-08-07 11:16:35,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 55 seconds)
2025-08-07 11:18:34,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:36,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 749.02258 ± 158.708
2025-08-07 11:18:36,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [678.6366, 705.7623, 838.296, 642.53937, 629.5834, 654.04816, 1135.5818, 916.03687, 603.3698, 686.3714]
2025-08-07 11:18:36,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 131.0, 157.0, 135.0, 133.0, 138.0, 225.0, 179.0, 127.0, 128.0]
2025-08-07 11:18:36,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 52 seconds)
2025-08-07 11:20:34,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:36,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 755.99689 ± 182.987
2025-08-07 11:20:36,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1184.2233, 595.8955, 601.06537, 570.07416, 692.651, 699.5437, 882.41565, 698.92303, 946.4954, 688.68146]
2025-08-07 11:20:36,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 110.0, 112.0, 106.0, 139.0, 132.0, 159.0, 137.0, 175.0, 132.0]
2025-08-07 11:20:36,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 41 seconds)
2025-08-07 11:22:35,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 699.23425 ± 105.977
2025-08-07 11:22:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [696.34406, 571.82404, 673.5257, 703.125, 633.85394, 655.85205, 584.4655, 711.8761, 951.51215, 809.9646]
2025-08-07 11:22:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 107.0, 135.0, 128.0, 118.0, 125.0, 109.0, 133.0, 178.0, 152.0]
2025-08-07 11:22:37,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 41 seconds)
2025-08-07 11:24:34,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:36,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 772.54962 ± 76.759
2025-08-07 11:24:36,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [698.5334, 809.12427, 799.6179, 773.36066, 677.8313, 768.7991, 909.6794, 860.8587, 647.18097, 780.5104]
2025-08-07 11:24:36,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 167.0, 150.0, 148.0, 143.0, 150.0, 191.0, 165.0, 141.0, 145.0]
2025-08-07 11:24:36,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-08-07 11:26:34,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:36,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 709.80090 ± 130.988
2025-08-07 11:26:36,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1036.1172, 537.8416, 582.88074, 660.8354, 637.2218, 749.76984, 768.4693, 709.63336, 765.5156, 649.7244]
2025-08-07 11:26:36,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 104.0, 126.0, 136.0, 117.0, 146.0, 161.0, 140.0, 154.0, 134.0]
2025-08-07 11:26:36,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 12 seconds)
2025-08-07 11:28:33,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:35,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 748.12415 ± 177.521
2025-08-07 11:28:35,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [610.2232, 562.7556, 785.42285, 607.7234, 1124.5955, 800.4032, 804.6971, 570.3416, 642.7306, 972.3493]
2025-08-07 11:28:35,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 105.0, 148.0, 116.0, 223.0, 157.0, 162.0, 108.0, 124.0, 194.0]
2025-08-07 11:28:35,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 47 seconds)
2025-08-07 11:30:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:33,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 615.93542 ± 106.025
2025-08-07 11:30:33,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [548.85376, 788.74646, 655.1628, 539.68225, 802.2818, 538.74646, 449.13242, 581.86444, 607.8886, 646.9948]
2025-08-07 11:30:33,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 153.0, 125.0, 117.0, 148.0, 108.0, 93.0, 115.0, 123.0, 131.0]
2025-08-07 11:30:34,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 26 seconds)
2025-08-07 11:32:30,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:32,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 689.79401 ± 83.975
2025-08-07 11:32:32,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [632.154, 618.02966, 880.0084, 601.6044, 716.1013, 670.5402, 636.4852, 630.8109, 787.0053, 725.2002]
2025-08-07 11:32:32,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 115.0, 170.0, 113.0, 137.0, 125.0, 120.0, 118.0, 146.0, 136.0]
2025-08-07 11:32:32,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 2 seconds)
2025-08-07 11:34:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 835.02673 ± 289.989
2025-08-07 11:34:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [641.2917, 652.86975, 669.9952, 1625.7622, 613.9238, 735.37646, 825.4711, 691.72815, 1028.2794, 865.5707]
2025-08-07 11:34:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 123.0, 139.0, 304.0, 129.0, 159.0, 154.0, 146.0, 206.0, 187.0]
2025-08-07 11:34:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (835.03) for latency MM1Queue_a033_s075
2025-08-07 11:34:32,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 9 seconds)
2025-08-07 11:36:28,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 762.12463 ± 137.365
2025-08-07 11:36:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [795.80646, 887.5346, 733.5634, 817.09576, 893.4686, 573.83984, 575.09235, 668.26074, 665.2374, 1011.34735]
2025-08-07 11:36:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 182.0, 158.0, 149.0, 168.0, 106.0, 108.0, 125.0, 122.0, 192.0]
2025-08-07 11:36:31,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 57 seconds)
2025-08-07 11:38:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:30,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 801.83429 ± 99.644
2025-08-07 11:38:30,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [947.9563, 807.7704, 717.2281, 961.6829, 799.7423, 737.8368, 656.394, 711.3103, 906.10895, 772.313]
2025-08-07 11:38:30,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 151.0, 137.0, 203.0, 156.0, 141.0, 142.0, 139.0, 173.0, 159.0]
2025-08-07 11:38:30,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 1 second)
2025-08-07 11:40:26,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:29,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 872.58429 ± 200.636
2025-08-07 11:40:29,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [769.93585, 716.3189, 951.46155, 1167.6819, 745.79126, 1189.484, 825.88104, 792.3252, 1041.6328, 525.32965]
2025-08-07 11:40:29,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 156.0, 180.0, 239.0, 153.0, 238.0, 162.0, 167.0, 221.0, 114.0]
2025-08-07 11:40:29,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (872.58) for latency MM1Queue_a033_s075
2025-08-07 11:40:29,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 7 seconds)
2025-08-07 11:42:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:29,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 820.51385 ± 152.035
2025-08-07 11:42:29,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [953.5075, 853.96106, 1113.258, 685.207, 673.6067, 642.1245, 651.3332, 979.6507, 847.6495, 804.8412]
2025-08-07 11:42:29,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 162.0, 217.0, 139.0, 130.0, 143.0, 140.0, 185.0, 164.0, 152.0]
2025-08-07 11:42:29,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 32 seconds)
2025-08-07 11:44:24,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:26,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 766.98969 ± 103.179
2025-08-07 11:44:26,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [706.333, 725.1877, 793.22565, 794.02325, 771.79877, 811.3821, 687.8522, 561.815, 842.6292, 975.64996]
2025-08-07 11:44:26,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 134.0, 150.0, 150.0, 143.0, 153.0, 145.0, 121.0, 160.0, 183.0]
2025-08-07 11:44:26,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 58 seconds)
2025-08-07 11:46:24,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 817.46008 ± 180.551
2025-08-07 11:46:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [539.05835, 917.0843, 915.917, 963.5547, 503.53415, 603.37317, 896.5635, 880.4554, 948.7011, 1006.35895]
2025-08-07 11:46:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 197.0, 179.0, 191.0, 109.0, 128.0, 183.0, 193.0, 174.0, 212.0]
2025-08-07 11:46:26,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 15 seconds)
2025-08-07 11:48:23,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:26,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 934.96436 ± 197.151
2025-08-07 11:48:26,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1170.722, 657.14197, 1389.8323, 939.736, 978.57574, 819.48425, 825.347, 849.35986, 820.4332, 899.01105]
2025-08-07 11:48:26,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [231.0, 143.0, 274.0, 191.0, 187.0, 164.0, 173.0, 175.0, 161.0, 168.0]
2025-08-07 11:48:26,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (934.96) for latency MM1Queue_a033_s075
2025-08-07 11:48:26,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 18 seconds)
2025-08-07 11:50:21,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:23,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 867.73907 ± 130.978
2025-08-07 11:50:23,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [659.03156, 1052.3362, 949.98114, 924.0174, 933.0852, 838.94196, 696.67456, 709.5633, 1022.88293, 890.8773]
2025-08-07 11:50:23,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 213.0, 187.0, 195.0, 184.0, 157.0, 132.0, 145.0, 215.0, 172.0]
2025-08-07 11:50:23,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 6 seconds)
2025-08-07 11:52:20,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:23,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 841.03967 ± 104.753
2025-08-07 11:52:23,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [994.9269, 635.8187, 778.32837, 782.57874, 780.81006, 821.97656, 1012.191, 871.2736, 889.861, 842.6318]
2025-08-07 11:52:23,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 118.0, 163.0, 148.0, 150.0, 154.0, 194.0, 171.0, 177.0, 159.0]
2025-08-07 11:52:23,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 36 minutes, 55 seconds)
2025-08-07 11:54:20,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:22,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 889.84338 ± 197.559
2025-08-07 11:54:22,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [667.45685, 804.2395, 960.68274, 831.4392, 1095.502, 1005.45874, 510.86954, 786.3567, 1191.6359, 1044.7921]
2025-08-07 11:54:22,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 152.0, 176.0, 158.0, 223.0, 190.0, 100.0, 150.0, 223.0, 198.0]
2025-08-07 11:54:22,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 21 seconds)
2025-08-07 11:56:19,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:23,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1203.22534 ± 338.684
2025-08-07 11:56:23,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1017.5651, 1064.2411, 1885.91, 1221.3414, 1455.313, 794.239, 1661.546, 1117.661, 948.8111, 865.62585]
2025-08-07 11:56:23,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 206.0, 359.0, 263.0, 301.0, 167.0, 335.0, 226.0, 192.0, 178.0]
2025-08-07 11:56:23,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1203.23) for latency MM1Queue_a033_s075
2025-08-07 11:56:23,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 26 seconds)
2025-08-07 11:58:20,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:22,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 887.12823 ± 268.583
2025-08-07 11:58:22,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [701.7117, 864.3099, 884.5272, 888.1281, 627.8772, 656.50256, 708.6576, 1279.3602, 767.9646, 1492.2435]
2025-08-07 11:58:22,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 161.0, 185.0, 175.0, 123.0, 129.0, 150.0, 239.0, 159.0, 302.0]
2025-08-07 11:58:22,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 29 seconds)
2025-08-07 12:00:19,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:22,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1097.57312 ± 257.692
2025-08-07 12:00:22,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1075.8885, 886.3039, 1077.0519, 1478.6381, 788.5354, 1174.7277, 877.2133, 842.89343, 1174.3708, 1600.1088]
2025-08-07 12:00:22,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 163.0, 201.0, 308.0, 153.0, 248.0, 158.0, 181.0, 243.0, 308.0]
2025-08-07 12:00:22,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 49 seconds)
2025-08-07 12:02:18,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:21,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1115.49512 ± 229.919
2025-08-07 12:02:21,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1395.1984, 1285.9604, 949.012, 1017.20593, 811.51874, 1445.1693, 864.5352, 1136.0039, 880.95703, 1369.3907]
2025-08-07 12:02:21,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [280.0, 243.0, 188.0, 189.0, 171.0, 275.0, 168.0, 212.0, 183.0, 275.0]
2025-08-07 12:02:21,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 49 seconds)
2025-08-07 12:04:19,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:22,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1077.49792 ± 284.669
2025-08-07 12:04:22,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [720.25275, 1248.1729, 901.4686, 1489.9701, 866.5696, 1121.1238, 866.2441, 1025.9073, 889.1504, 1646.1199]
2025-08-07 12:04:22,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 241.0, 175.0, 294.0, 175.0, 213.0, 167.0, 200.0, 183.0, 317.0]
2025-08-07 12:04:22,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 57 seconds)
2025-08-07 12:06:19,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:22,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 966.25067 ± 208.763
2025-08-07 12:06:22,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [723.3844, 795.0002, 1039.8588, 1232.6958, 915.79706, 1445.6139, 885.223, 820.88464, 939.54456, 864.5035]
2025-08-07 12:06:22,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 160.0, 197.0, 238.0, 183.0, 267.0, 160.0, 153.0, 175.0, 162.0]
2025-08-07 12:06:22,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 54 seconds)
2025-08-07 12:08:18,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:21,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1001.40759 ± 241.245
2025-08-07 12:08:21,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1135.479, 1097.4075, 1039.7234, 785.3831, 1443.4084, 744.8345, 706.4536, 802.6435, 921.7468, 1336.996]
2025-08-07 12:08:21,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 216.0, 215.0, 143.0, 280.0, 144.0, 135.0, 153.0, 180.0, 254.0]
2025-08-07 12:08:21,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 49 seconds)
2025-08-07 12:10:20,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1218.95044 ± 277.125
2025-08-07 12:10:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1541.5204, 1055.6292, 1522.6237, 1440.3607, 1280.2279, 1110.4005, 1298.1696, 551.4379, 1311.6998, 1077.4353]
2025-08-07 12:10:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [303.0, 209.0, 309.0, 285.0, 241.0, 217.0, 264.0, 117.0, 268.0, 229.0]
2025-08-07 12:10:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1218.95) for latency MM1Queue_a033_s075
2025-08-07 12:10:23,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 9 seconds)
2025-08-07 12:12:20,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 898.83459 ± 200.123
2025-08-07 12:12:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [725.458, 710.82043, 777.4902, 847.69824, 994.8574, 679.01434, 1185.288, 737.06226, 1094.8394, 1235.8171]
2025-08-07 12:12:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 143.0, 155.0, 161.0, 183.0, 126.0, 227.0, 139.0, 215.0, 251.0]
2025-08-07 12:12:23,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 13 seconds)
2025-08-07 12:14:19,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:22,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1089.58411 ± 417.125
2025-08-07 12:14:22,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [772.702, 952.21954, 863.37994, 898.6762, 1524.8103, 2070.5579, 659.99554, 1256.8759, 1202.1418, 694.4823]
2025-08-07 12:14:22,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 183.0, 163.0, 167.0, 288.0, 391.0, 137.0, 237.0, 248.0, 127.0]
2025-08-07 12:14:22,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 2 seconds)
2025-08-07 12:16:21,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:24,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1084.02319 ± 330.142
2025-08-07 12:16:24,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [953.9033, 1017.89703, 1019.9343, 1012.4519, 892.32623, 712.3398, 1923.1935, 942.8668, 917.2217, 1448.098]
2025-08-07 12:16:24,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 201.0, 191.0, 207.0, 177.0, 152.0, 387.0, 188.0, 169.0, 270.0]
2025-08-07 12:16:24,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 15 seconds)
2025-08-07 12:18:20,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:22,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 916.29999 ± 246.405
2025-08-07 12:18:22,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1153.4924, 1529.7837, 710.5746, 683.1305, 949.0639, 687.9126, 873.6147, 915.5225, 767.2173, 892.68787]
2025-08-07 12:18:22,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 295.0, 156.0, 139.0, 179.0, 146.0, 171.0, 170.0, 143.0, 174.0]
2025-08-07 12:18:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 10 seconds)
2025-08-07 12:20:21,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:25,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1136.92578 ± 335.178
2025-08-07 12:20:25,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1183.4419, 1434.2783, 842.30035, 1094.2279, 599.9844, 1079.4281, 1033.1852, 868.4393, 1383.682, 1850.2897]
2025-08-07 12:20:25,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 298.0, 177.0, 213.0, 125.0, 215.0, 199.0, 162.0, 286.0, 361.0]
2025-08-07 12:20:25,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 9 seconds)
2025-08-07 12:22:24,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:27,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1107.34985 ± 370.073
2025-08-07 12:22:27,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [757.8613, 1900.6388, 1100.9258, 820.5327, 898.8301, 1187.4474, 1120.9406, 636.2103, 1034.4187, 1615.6924]
2025-08-07 12:22:27,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 348.0, 208.0, 159.0, 170.0, 215.0, 217.0, 132.0, 186.0, 294.0]
2025-08-07 12:22:27,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 28 seconds)
2025-08-07 12:24:22,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:25,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1254.07520 ± 475.259
2025-08-07 12:24:25,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1585.1127, 933.7947, 802.2512, 721.4089, 788.31104, 1734.8657, 812.30176, 1457.4354, 1572.0067, 2133.2634]
2025-08-07 12:24:25,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [333.0, 189.0, 148.0, 155.0, 168.0, 349.0, 176.0, 312.0, 297.0, 397.0]
2025-08-07 12:24:25,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1254.08) for latency MM1Queue_a033_s075
2025-08-07 12:24:25,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 20 seconds)
2025-08-07 12:26:22,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:25,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 885.68713 ± 254.197
2025-08-07 12:26:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [739.9678, 702.9043, 578.8939, 820.20074, 737.2525, 1056.6467, 1542.8352, 933.31, 811.6626, 933.1974]
2025-08-07 12:26:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 137.0, 125.0, 155.0, 155.0, 206.0, 292.0, 184.0, 174.0, 180.0]
2025-08-07 12:26:25,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 5 seconds)
2025-08-07 12:28:20,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:24,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1299.41760 ± 505.989
2025-08-07 12:28:24,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [621.094, 1834.9489, 1036.1721, 2180.113, 1553.4269, 812.38654, 896.17175, 1451.3804, 1794.6205, 813.8623]
2025-08-07 12:28:24,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 339.0, 207.0, 413.0, 304.0, 173.0, 169.0, 294.0, 340.0, 152.0]
2025-08-07 12:28:24,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1299.42) for latency MM1Queue_a033_s075
2025-08-07 12:28:24,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 9 seconds)
2025-08-07 12:30:18,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:22,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1255.59412 ± 234.865
2025-08-07 12:30:22,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1659.6678, 1422.4463, 1229.0712, 1336.5502, 1090.2406, 1296.2902, 1218.1163, 1134.4692, 731.2368, 1437.8539]
2025-08-07 12:30:22,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [308.0, 266.0, 225.0, 254.0, 210.0, 237.0, 229.0, 234.0, 135.0, 281.0]
2025-08-07 12:30:22,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 40 seconds)
2025-08-07 12:32:17,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:21,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1355.52124 ± 461.740
2025-08-07 12:32:21,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1064.3248, 2478.2031, 1364.9824, 1498.4395, 1314.9626, 1741.3921, 1239.0605, 1217.307, 764.27747, 872.26324]
2025-08-07 12:32:21,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 493.0, 253.0, 289.0, 250.0, 342.0, 255.0, 233.0, 157.0, 168.0]
2025-08-07 12:32:21,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1355.52) for latency MM1Queue_a033_s075
2025-08-07 12:32:21,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 26 seconds)
2025-08-07 12:34:16,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:20,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1176.32300 ± 347.446
2025-08-07 12:34:20,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [667.6259, 1547.5164, 631.2377, 1145.8782, 1065.354, 1101.3944, 1219.3992, 1805.9266, 1477.6049, 1101.2942]
2025-08-07 12:34:20,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 295.0, 136.0, 227.0, 212.0, 230.0, 238.0, 348.0, 292.0, 219.0]
2025-08-07 12:34:20,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 26 seconds)
2025-08-07 12:36:16,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:19,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1078.05347 ± 297.709
2025-08-07 12:36:19,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1369.3176, 674.7907, 1506.231, 801.2293, 915.64496, 1090.5677, 1203.2488, 1183.6313, 619.0, 1416.8739]
2025-08-07 12:36:19,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 127.0, 288.0, 149.0, 171.0, 202.0, 242.0, 225.0, 126.0, 258.0]
2025-08-07 12:36:19,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 30 seconds)
2025-08-07 12:38:16,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:19,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 994.52478 ± 209.990
2025-08-07 12:38:19,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1161.2594, 748.1848, 1198.727, 1302.1766, 659.3559, 837.99115, 854.4929, 888.9959, 1178.8506, 1115.2126]
2025-08-07 12:38:19,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 156.0, 225.0, 255.0, 136.0, 168.0, 158.0, 176.0, 237.0, 229.0]
2025-08-07 12:38:19,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 31 seconds)
2025-08-07 12:40:17,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:21,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1326.91199 ± 419.217
2025-08-07 12:40:21,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1100.0759, 1279.5541, 1224.5964, 1456.2461, 997.5783, 1428.9506, 2178.6375, 1924.223, 856.2714, 822.98724]
2025-08-07 12:40:21,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 249.0, 250.0, 267.0, 187.0, 280.0, 404.0, 374.0, 167.0, 152.0]
2025-08-07 12:40:21,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 55 seconds)
2025-08-07 12:42:15,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:20,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1443.06079 ± 773.788
2025-08-07 12:42:20,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [830.7152, 2876.8352, 800.27814, 1234.3658, 1049.3766, 1087.2701, 1008.75757, 815.2873, 1844.2268, 2883.4958]
2025-08-07 12:42:20,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 553.0, 154.0, 253.0, 199.0, 217.0, 190.0, 155.0, 342.0, 582.0]
2025-08-07 12:42:20,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1443.06) for latency MM1Queue_a033_s075
2025-08-07 12:42:20,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 51 seconds)
2025-08-07 12:44:15,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:19,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1305.28394 ± 632.247
2025-08-07 12:44:19,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2046.4666, 657.0655, 800.1333, 768.2559, 1109.7595, 1594.6484, 625.4365, 2642.2341, 1620.0624, 1188.7766]
2025-08-07 12:44:19,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [389.0, 138.0, 169.0, 154.0, 214.0, 338.0, 133.0, 534.0, 323.0, 240.0]
2025-08-07 12:44:19,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 56 seconds)
2025-08-07 12:46:13,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:16,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1173.65625 ± 250.835
2025-08-07 12:46:16,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [875.8363, 1515.6428, 1322.3868, 1518.8356, 1131.0369, 875.54156, 962.101, 1178.1732, 907.6247, 1449.3834]
2025-08-07 12:46:16,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 315.0, 256.0, 288.0, 212.0, 171.0, 191.0, 216.0, 182.0, 280.0]
2025-08-07 12:46:16,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 46 seconds)
2025-08-07 12:48:12,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:48:17,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1660.06738 ± 502.321
2025-08-07 12:48:17,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1954.0359, 1164.5924, 2238.0713, 1128.1057, 1540.0483, 816.27704, 1715.994, 1694.9889, 1777.5972, 2570.9622]
2025-08-07 12:48:17,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [366.0, 251.0, 421.0, 210.0, 295.0, 179.0, 335.0, 315.0, 326.0, 521.0]
2025-08-07 12:48:17,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (1660.07) for latency MM1Queue_a033_s075
2025-08-07 12:48:17,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 54 seconds)
2025-08-07 12:50:13,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:50:17,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1291.95398 ± 411.802
2025-08-07 12:50:17,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [811.3399, 1766.1775, 1462.9391, 1185.0574, 1097.249, 2277.1973, 1125.7415, 1102.139, 1100.0056, 991.6947]
2025-08-07 12:50:17,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 329.0, 274.0, 223.0, 207.0, 437.0, 215.0, 208.0, 211.0, 221.0]
2025-08-07 12:50:17,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 45 seconds)
2025-08-07 12:52:14,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:52:19,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1579.99731 ± 819.275
2025-08-07 12:52:19,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1424.1571, 1546.2959, 1086.9313, 949.41614, 1810.4148, 1497.6327, 3783.5598, 1090.53, 697.19183, 1913.8436]
2025-08-07 12:52:19,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 309.0, 193.0, 202.0, 352.0, 281.0, 747.0, 212.0, 144.0, 376.0]
2025-08-07 12:52:19,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 57 seconds)
2025-08-07 12:54:20,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:54:24,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1448.44165 ± 486.335
2025-08-07 12:54:24,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2003.45, 1406.4048, 1980.0356, 1366.7203, 1535.1224, 952.0276, 555.8188, 1047.9894, 1443.3987, 2193.45]
2025-08-07 12:54:24,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [402.0, 262.0, 375.0, 257.0, 286.0, 180.0, 116.0, 206.0, 280.0, 430.0]
2025-08-07 12:54:24,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 19 seconds)
2025-08-07 12:56:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:56:19,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1631.95520 ± 636.643
2025-08-07 12:56:19,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2242.0146, 976.75104, 1552.9142, 1260.7485, 1962.6461, 1187.208, 1433.1842, 2048.3372, 707.423, 2948.325]
2025-08-07 12:56:19,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [409.0, 194.0, 331.0, 233.0, 393.0, 241.0, 259.0, 391.0, 147.0, 560.0]
2025-08-07 12:56:19,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 7 seconds)
2025-08-07 12:58:16,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:58:20,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1393.18823 ± 909.563
2025-08-07 12:58:20,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [740.02, 969.54346, 2124.8538, 621.6938, 1200.2035, 3704.0579, 805.3071, 1743.6046, 564.66034, 1457.939]
2025-08-07 12:58:20,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 198.0, 438.0, 137.0, 244.0, 764.0, 179.0, 340.0, 122.0, 272.0]
2025-08-07 12:58:20,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 10 seconds)
2025-08-07 13:00:18,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:00:23,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1627.98657 ± 675.750
2025-08-07 13:00:23,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [848.1794, 1561.7805, 1240.7638, 1181.1293, 1937.9672, 1259.5111, 3363.4465, 1238.5039, 2081.021, 1567.5621]
2025-08-07 13:00:23,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 288.0, 266.0, 246.0, 378.0, 247.0, 645.0, 236.0, 410.0, 295.0]
2025-08-07 13:00:23,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 17 seconds)
2025-08-07 13:02:22,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:02:27,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1632.28857 ± 550.079
2025-08-07 13:02:27,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1541.4258, 1760.0886, 1384.0402, 1124.891, 1378.6794, 1056.7866, 1509.5403, 2504.355, 1259.4276, 2803.6516]
2025-08-07 13:02:27,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [320.0, 352.0, 283.0, 220.0, 279.0, 200.0, 311.0, 461.0, 251.0, 521.0]
2025-08-07 13:02:27,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 22 seconds)
2025-08-07 13:04:17,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:04:24,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2228.66992 ± 927.974
2025-08-07 13:04:24,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1288.8447, 1735.7952, 2003.8154, 2005.9303, 1358.9053, 3317.4556, 4080.2605, 1780.0781, 3301.2231, 1414.3909]
2025-08-07 13:04:24,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 314.0, 383.0, 365.0, 257.0, 636.0, 800.0, 334.0, 610.0, 279.0]
2025-08-07 13:04:24,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (2228.67) for latency MM1Queue_a033_s075
2025-08-07 13:04:24,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 59 seconds)
2025-08-07 13:06:22,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:06:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1311.56226 ± 360.434
2025-08-07 13:06:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1023.2721, 1200.7551, 1300.8235, 1312.1816, 1985.5328, 1667.9537, 1197.8085, 1001.5111, 1709.7952, 715.99]
2025-08-07 13:06:26,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 240.0, 249.0, 259.0, 390.0, 356.0, 233.0, 201.0, 319.0, 140.0]
2025-08-07 13:06:26,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 16 seconds)
2025-08-07 13:08:21,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:08:27,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1830.91431 ± 767.809
2025-08-07 13:08:27,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1037.6498, 1377.2692, 1220.5369, 1119.1587, 2066.3586, 2419.0278, 2422.5227, 1334.3901, 3610.401, 1701.8273]
2025-08-07 13:08:27,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 249.0, 227.0, 229.0, 387.0, 442.0, 452.0, 252.0, 687.0, 323.0]
2025-08-07 13:08:27,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 13 seconds)
2025-08-07 13:10:30,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:10:34,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1152.18774 ± 615.382
2025-08-07 13:10:34,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1093.3771, 905.79407, 1096.24, 925.44135, 995.41656, 543.01746, 815.7577, 954.61816, 1278.0657, 2914.148]
2025-08-07 13:10:34,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 196.0, 230.0, 195.0, 218.0, 118.0, 182.0, 214.0, 264.0, 564.0]
2025-08-07 13:10:34,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 22 seconds)
2025-08-07 13:12:28,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:12:32,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1383.66675 ± 591.549
2025-08-07 13:12:32,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1533.5181, 1203.6514, 926.9447, 1546.8854, 980.402, 2736.25, 908.3524, 969.7135, 888.94635, 2142.005]
2025-08-07 13:12:32,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [293.0, 227.0, 181.0, 317.0, 220.0, 533.0, 184.0, 199.0, 194.0, 423.0]
2025-08-07 13:12:32,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 9 seconds)
2025-08-07 13:14:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:14:34,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1635.41431 ± 581.087
2025-08-07 13:14:34,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1027.1276, 1927.2714, 1223.9651, 1555.0419, 1546.3898, 3165.5793, 1044.3716, 1612.9639, 1473.5247, 1777.9075]
2025-08-07 13:14:34,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 369.0, 239.0, 293.0, 282.0, 599.0, 225.0, 301.0, 288.0, 357.0]
2025-08-07 13:14:34,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 16 seconds)
2025-08-07 13:16:31,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:16:39,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2881.44385 ± 1112.235
2025-08-07 13:16:39,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3916.88, 2678.9727, 1848.4172, 2494.1343, 2819.5278, 2422.6985, 1137.3832, 3545.5854, 2570.1245, 5380.715]
2025-08-07 13:16:39,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [748.0, 525.0, 361.0, 480.0, 577.0, 474.0, 218.0, 700.0, 499.0, 1000.0]
2025-08-07 13:16:39,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (2881.44) for latency MM1Queue_a033_s075
2025-08-07 13:16:39,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 18 seconds)
2025-08-07 13:18:29,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:18:36,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2352.00415 ± 1234.625
2025-08-07 13:18:36,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2383.4285, 714.0864, 2199.2146, 1932.7325, 3327.931, 1174.1322, 1943.887, 5305.2095, 1521.7981, 3017.621]
2025-08-07 13:18:36,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [480.0, 147.0, 413.0, 374.0, 628.0, 237.0, 356.0, 1000.0, 305.0, 551.0]
2025-08-07 13:18:36,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 11 seconds)
2025-08-07 13:20:33,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:20:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2926.45215 ± 1042.647
2025-08-07 13:20:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2149.3665, 5102.4795, 2675.141, 1922.1033, 3486.0625, 2136.9978, 3804.67, 1381.2587, 3504.5754, 3101.867]
2025-08-07 13:20:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [415.0, 1000.0, 536.0, 383.0, 669.0, 401.0, 742.0, 278.0, 658.0, 629.0]
2025-08-07 13:20:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1226 [INFO]: New best (2926.45) for latency MM1Queue_a033_s075
2025-08-07 13:20:42,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 7 seconds)
2025-08-07 13:22:43,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:22:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1637.48364 ± 763.670
2025-08-07 13:22:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2395.154, 1466.2738, 1080.5824, 781.0074, 628.2354, 2148.8113, 3226.9111, 1082.7606, 1662.7014, 1902.3982]
2025-08-07 13:22:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [446.0, 281.0, 197.0, 161.0, 132.0, 410.0, 613.0, 212.0, 335.0, 380.0]
2025-08-07 13:22:48,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 12 seconds)
2025-08-07 13:24:40,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:24:47,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2322.66187 ± 1195.483
2025-08-07 13:24:47,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2470.3474, 1043.0458, 1027.9022, 1052.2439, 1717.3226, 5113.78, 2903.9802, 2951.3176, 2022.3005, 2924.3787]
2025-08-07 13:24:47,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [485.0, 186.0, 212.0, 206.0, 348.0, 1000.0, 564.0, 600.0, 408.0, 545.0]
2025-08-07 13:24:47,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 7 seconds)
2025-08-07 13:26:40,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:46,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2078.30420 ± 1335.114
2025-08-07 13:26:46,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1791.0514, 1603.9977, 2283.647, 896.76776, 1121.1484, 802.69354, 5307.399, 1660.9801, 3721.571, 1593.7847]
2025-08-07 13:26:46,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [369.0, 283.0, 437.0, 184.0, 236.0, 176.0, 1000.0, 305.0, 697.0, 310.0]
2025-08-07 13:26:46,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 2 seconds)
2025-08-07 13:28:45,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:28:51,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2078.61548 ± 1036.934
2025-08-07 13:28:51,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2488.1843, 1202.0684, 1308.6617, 2697.3083, 3246.5125, 4348.898, 1451.605, 1356.4866, 1747.084, 939.34503]
2025-08-07 13:28:51,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [489.0, 245.0, 260.0, 534.0, 639.0, 818.0, 288.0, 252.0, 369.0, 201.0]
2025-08-07 13:28:51,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 3 seconds)
2025-08-07 13:30:38,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:30:46,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2788.70850 ± 1334.607
2025-08-07 13:30:46,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2664.1204, 5142.0483, 3227.2236, 1644.0662, 5266.4263, 2093.4033, 1183.9194, 2743.7239, 1656.382, 2265.7715]
2025-08-07 13:30:46,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [483.0, 975.0, 605.0, 336.0, 1000.0, 398.0, 235.0, 518.0, 312.0, 414.0]
2025-08-07 13:30:46,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-humanoid):1251 [DEBUG]: Training session finished
