2025-05-13 09:06:40,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mda-highdim-mem16
2025-05-13 09:06:40,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mda-highdim-mem16
2025-05-13 09:06:40,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14b1187e6b90>}
2025-05-13 09:06:40,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:40,871 baseline-bpql-mda-noisy-humanoid:91 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-13 09:06:40,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-13 09:06:40,889 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-13 09:06:40,889 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:40,898 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2025-05-13 09:06:41,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:41,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-13 09:11:13,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:11:14,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 185.99869 ± 18.176
2025-05-13 09:11:14,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [175.14027, 161.14157, 201.18823, 183.44556, 183.64946, 179.77596, 204.5319, 208.4142, 154.91881, 207.78102]
2025-05-13 09:11:14,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 33.0, 40.0, 39.0, 37.0, 36.0, 40.0, 43.0, 32.0, 42.0]
2025-05-13 09:11:14,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (186.00) for latency MM1Queue_a033_s075
2025-05-13 09:11:14,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 29 minutes, 58 seconds)
2025-05-13 09:15:56,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:15:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 296.98578 ± 48.170
2025-05-13 09:15:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [273.99634, 296.29413, 242.20844, 280.9898, 267.8513, 361.7091, 241.89581, 322.33563, 399.5019, 283.07498]
2025-05-13 09:15:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 57.0, 48.0, 54.0, 51.0, 71.0, 48.0, 63.0, 75.0, 55.0]
2025-05-13 09:15:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (296.99) for latency MM1Queue_a033_s075
2025-05-13 09:15:57,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 33 minutes, 21 seconds)
2025-05-13 09:20:40,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:20:41,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 395.74939 ± 57.713
2025-05-13 09:20:41,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [420.1478, 420.62152, 508.6955, 398.86066, 354.06235, 465.66434, 341.04572, 368.9664, 376.5398, 302.88986]
2025-05-13 09:20:41,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 79.0, 94.0, 75.0, 67.0, 86.0, 65.0, 70.0, 70.0, 57.0]
2025-05-13 09:20:41,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (395.75) for latency MM1Queue_a033_s075
2025-05-13 09:20:41,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 32 minutes, 26 seconds)
2025-05-13 09:25:26,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:25:28,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 355.62433 ± 58.305
2025-05-13 09:25:28,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [296.65738, 337.62656, 411.11063, 321.46677, 421.23672, 449.0223, 349.07816, 343.30026, 245.71542, 381.02924]
2025-05-13 09:25:28,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 64.0, 78.0, 61.0, 85.0, 87.0, 65.0, 64.0, 48.0, 71.0]
2025-05-13 09:25:28,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 30 minutes, 26 seconds)
2025-05-13 09:30:12,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:30:14,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 488.10162 ± 130.918
2025-05-13 09:30:14,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [489.96805, 395.86127, 412.00815, 397.93402, 519.8084, 417.11087, 404.21616, 486.3993, 501.22357, 856.4862]
2025-05-13 09:30:14,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 71.0, 73.0, 71.0, 109.0, 75.0, 75.0, 89.0, 93.0, 166.0]
2025-05-13 09:30:14,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (488.10) for latency MM1Queue_a033_s075
2025-05-13 09:30:14,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 27 minutes, 8 seconds)
2025-05-13 09:34:58,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:35:00,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 519.91266 ± 98.213
2025-05-13 09:35:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [631.02216, 445.84647, 679.7667, 395.18057, 592.463, 443.9712, 627.45166, 461.99533, 412.9801, 508.44934]
2025-05-13 09:35:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 80.0, 128.0, 72.0, 109.0, 88.0, 118.0, 94.0, 86.0, 92.0]
2025-05-13 09:35:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (519.91) for latency MM1Queue_a033_s075
2025-05-13 09:35:00,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 26 minutes, 37 seconds)
2025-05-13 09:39:46,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:39:48,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 464.58136 ± 124.971
2025-05-13 09:39:48,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [478.8889, 512.819, 146.73369, 590.5958, 384.5225, 411.59833, 552.1727, 510.14252, 460.4995, 597.8404]
2025-05-13 09:39:48,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 93.0, 28.0, 122.0, 73.0, 76.0, 103.0, 107.0, 93.0, 112.0]
2025-05-13 09:39:48,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 23 minutes, 48 seconds)
2025-05-13 09:44:33,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:44:35,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 496.25473 ± 79.058
2025-05-13 09:44:35,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [406.81805, 422.66995, 489.29327, 501.4507, 458.56055, 655.02124, 563.0535, 487.59937, 394.84882, 583.23206]
2025-05-13 09:44:35,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 77.0, 89.0, 91.0, 83.0, 128.0, 102.0, 98.0, 74.0, 106.0]
2025-05-13 09:44:35,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 7 hours, 19 minutes, 47 seconds)
2025-05-13 09:49:19,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:49:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 498.05411 ± 75.681
2025-05-13 09:49:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [456.98984, 524.29596, 527.7332, 577.81665, 380.68573, 436.27484, 481.8945, 467.24384, 664.32715, 463.27936]
2025-05-13 09:49:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 98.0, 99.0, 109.0, 70.0, 80.0, 88.0, 98.0, 126.0, 94.0]
2025-05-13 09:49:21,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 7 hours, 14 minutes, 51 seconds)
2025-05-13 09:54:06,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:54:08,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 505.66196 ± 45.604
2025-05-13 09:54:08,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [460.2212, 454.9872, 541.83655, 454.5382, 586.1192, 488.93698, 491.04697, 474.8224, 548.90765, 555.20306]
2025-05-13 09:54:08,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 82.0, 100.0, 82.0, 107.0, 93.0, 89.0, 84.0, 105.0, 100.0]
2025-05-13 09:54:08,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 7 hours, 10 minutes, 15 seconds)
2025-05-13 09:58:55,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:58:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 567.94177 ± 88.199
2025-05-13 09:58:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [581.8464, 466.2124, 608.34106, 757.61914, 593.373, 655.44, 490.35803, 470.45718, 563.96735, 491.8034]
2025-05-13 09:58:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 95.0, 110.0, 152.0, 112.0, 130.0, 103.0, 93.0, 106.0, 99.0]
2025-05-13 09:58:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (567.94) for latency MM1Queue_a033_s075
2025-05-13 09:58:57,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 7 hours, 6 minutes, 34 seconds)
2025-05-13 10:03:42,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:03:44,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 550.66382 ± 94.255
2025-05-13 10:03:44,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [391.50546, 495.81473, 467.71457, 511.2103, 579.154, 593.23804, 629.05927, 702.4483, 667.33386, 469.15967]
2025-05-13 10:03:44,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 96.0, 93.0, 93.0, 106.0, 105.0, 111.0, 125.0, 122.0, 86.0]
2025-05-13 10:03:44,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 7 hours, 1 minute, 13 seconds)
2025-05-13 10:08:33,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:08:35,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 572.64832 ± 80.092
2025-05-13 10:08:35,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [601.6869, 396.1959, 648.4188, 662.737, 605.18353, 453.80646, 568.77875, 582.8821, 579.7477, 627.0457]
2025-05-13 10:08:35,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 73.0, 117.0, 122.0, 114.0, 92.0, 105.0, 111.0, 104.0, 113.0]
2025-05-13 10:08:35,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (572.65) for latency MM1Queue_a033_s075
2025-05-13 10:08:35,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 57 minutes, 29 seconds)
2025-05-13 10:13:17,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:13:19,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 539.95715 ± 66.231
2025-05-13 10:13:19,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [521.96075, 510.21988, 488.68893, 595.473, 591.6184, 438.98038, 674.9579, 506.3002, 487.27393, 584.0983]
2025-05-13 10:13:19,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 92.0, 90.0, 105.0, 107.0, 78.0, 124.0, 89.0, 89.0, 108.0]
2025-05-13 10:13:19,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 52 minutes, 17 seconds)
2025-05-13 10:18:06,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:18:09,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 694.39844 ± 226.259
2025-05-13 10:18:09,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [465.6709, 510.07776, 871.7457, 740.58167, 620.14557, 737.38983, 1272.7383, 634.25006, 511.2367, 580.14813]
2025-05-13 10:18:09,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 91.0, 167.0, 138.0, 113.0, 142.0, 250.0, 115.0, 93.0, 106.0]
2025-05-13 10:18:09,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (694.40) for latency MM1Queue_a033_s075
2025-05-13 10:18:09,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 48 minutes, 17 seconds)
2025-05-13 10:22:51,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:22:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 568.88556 ± 156.275
2025-05-13 10:22:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [324.4661, 732.50226, 366.9261, 541.4039, 545.6919, 612.5074, 679.8901, 422.24515, 847.67114, 615.5516]
2025-05-13 10:22:53,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 135.0, 67.0, 111.0, 97.0, 111.0, 124.0, 75.0, 168.0, 126.0]
2025-05-13 10:22:53,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 41 minutes, 54 seconds)
2025-05-13 10:27:37,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:27:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 675.64136 ± 78.279
2025-05-13 10:27:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [603.1218, 748.7576, 772.57245, 628.09784, 630.42633, 694.51697, 745.3942, 528.33167, 768.1567, 637.03796]
2025-05-13 10:27:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 134.0, 141.0, 117.0, 115.0, 131.0, 152.0, 94.0, 144.0, 128.0]
2025-05-13 10:27:39,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 37 minutes, 1 second)
2025-05-13 10:32:26,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:32:28,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 552.69391 ± 120.978
2025-05-13 10:32:28,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [783.5994, 476.35965, 512.9092, 664.1665, 657.6462, 367.72803, 529.78534, 629.7964, 479.1061, 425.84244]
2025-05-13 10:32:28,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 89.0, 92.0, 128.0, 118.0, 68.0, 104.0, 119.0, 89.0, 78.0]
2025-05-13 10:32:28,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 31 minutes, 38 seconds)
2025-05-13 10:37:10,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:37:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 627.78687 ± 156.011
2025-05-13 10:37:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [446.44797, 753.082, 624.9293, 346.21317, 687.0118, 792.1215, 661.172, 710.7312, 824.31024, 431.84967]
2025-05-13 10:37:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 142.0, 114.0, 64.0, 127.0, 146.0, 122.0, 123.0, 152.0, 80.0]
2025-05-13 10:37:12,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 26 minutes, 54 seconds)
2025-05-13 10:41:57,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:42:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 546.12469 ± 101.083
2025-05-13 10:42:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [613.8535, 728.78125, 639.93384, 517.33026, 556.9826, 521.9602, 381.10153, 428.9033, 615.96533, 456.4348]
2025-05-13 10:42:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 140.0, 117.0, 107.0, 105.0, 97.0, 69.0, 76.0, 119.0, 93.0]
2025-05-13 10:42:00,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 21 minutes, 33 seconds)
2025-05-13 10:46:43,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:46:45,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 608.33533 ± 101.118
2025-05-13 10:46:45,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [678.87366, 676.9583, 698.86444, 502.35236, 747.34924, 480.4859, 711.71423, 598.915, 477.9944, 509.84525]
2025-05-13 10:46:45,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 134.0, 131.0, 100.0, 133.0, 99.0, 127.0, 122.0, 89.0, 100.0]
2025-05-13 10:46:45,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 6 hours, 17 minutes, 7 seconds)
2025-05-13 10:51:28,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:51:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 688.57642 ± 95.429
2025-05-13 10:51:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [613.5847, 826.71466, 796.1557, 544.45605, 716.5471, 661.6233, 637.4376, 817.727, 573.98083, 697.5369]
2025-05-13 10:51:31,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 148.0, 140.0, 101.0, 135.0, 123.0, 118.0, 152.0, 106.0, 132.0]
2025-05-13 10:51:31,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 6 hours, 12 minutes, 13 seconds)
2025-05-13 10:56:15,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:56:17,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 666.87634 ± 118.689
2025-05-13 10:56:17,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [571.79877, 663.9655, 906.5873, 657.6994, 627.5, 513.63245, 831.645, 534.43243, 726.39435, 635.10815]
2025-05-13 10:56:17,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 126.0, 167.0, 127.0, 111.0, 93.0, 158.0, 97.0, 133.0, 115.0]
2025-05-13 10:56:17,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 6 hours, 6 minutes, 51 seconds)
2025-05-13 11:01:04,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:01:06,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 673.51868 ± 170.875
2025-05-13 11:01:06,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [958.71, 513.055, 371.76868, 717.6731, 587.6198, 826.6526, 773.07355, 795.16095, 485.60156, 705.8715]
2025-05-13 11:01:06,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 92.0, 68.0, 134.0, 105.0, 156.0, 155.0, 141.0, 88.0, 142.0]
2025-05-13 11:01:06,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 6 hours, 3 minutes, 12 seconds)
2025-05-13 11:05:49,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:05:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 615.64258 ± 180.709
2025-05-13 11:05:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [731.4461, 566.64905, 849.5365, 692.6149, 708.5841, 142.01462, 547.6823, 557.4519, 660.1196, 700.3266]
2025-05-13 11:05:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 109.0, 163.0, 128.0, 126.0, 27.0, 111.0, 99.0, 119.0, 134.0]
2025-05-13 11:05:51,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 57 minutes, 57 seconds)
2025-05-13 11:10:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:10:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 653.83557 ± 131.548
2025-05-13 11:10:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [532.7941, 635.9843, 510.3424, 616.4403, 586.8393, 884.73315, 872.6874, 712.7974, 691.5632, 494.17447]
2025-05-13 11:10:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 113.0, 102.0, 112.0, 104.0, 168.0, 160.0, 147.0, 124.0, 89.0]
2025-05-13 11:10:39,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 53 minutes, 37 seconds)
2025-05-13 11:15:27,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:15:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 720.56604 ± 117.823
2025-05-13 11:15:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [835.5883, 601.13806, 634.9739, 817.28455, 788.83124, 601.96075, 938.63135, 557.8581, 744.9224, 684.47186]
2025-05-13 11:15:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 107.0, 114.0, 146.0, 153.0, 108.0, 175.0, 100.0, 132.0, 130.0]
2025-05-13 11:15:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (720.57) for latency MM1Queue_a033_s075
2025-05-13 11:15:29,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 49 minutes, 59 seconds)
2025-05-13 11:20:13,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:20:15,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 691.75043 ± 134.605
2025-05-13 11:20:15,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [657.8245, 501.52512, 635.2254, 552.9396, 760.6659, 609.92523, 902.1663, 836.1221, 875.1425, 585.9678]
2025-05-13 11:20:15,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 91.0, 112.0, 99.0, 140.0, 111.0, 168.0, 150.0, 158.0, 103.0]
2025-05-13 11:20:15,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 45 minutes, 12 seconds)
2025-05-13 11:24:59,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:25:02,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 756.64526 ± 91.515
2025-05-13 11:25:02,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [779.56995, 815.8789, 611.2162, 858.53265, 711.8327, 741.1157, 665.6087, 733.1537, 706.8438, 942.7]
2025-05-13 11:25:02,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 149.0, 119.0, 162.0, 135.0, 134.0, 121.0, 138.0, 128.0, 169.0]
2025-05-13 11:25:02,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (756.65) for latency MM1Queue_a033_s075
2025-05-13 11:25:02,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 39 minutes, 53 seconds)
2025-05-13 11:29:46,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:29:49,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 722.40814 ± 120.086
2025-05-13 11:29:49,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [665.9359, 589.1106, 630.30164, 770.38885, 684.0361, 621.3269, 616.1558, 885.8976, 800.49225, 960.43585]
2025-05-13 11:29:49,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 102.0, 115.0, 146.0, 129.0, 111.0, 109.0, 165.0, 140.0, 173.0]
2025-05-13 11:29:49,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 35 minutes, 19 seconds)
2025-05-13 11:34:33,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:34:35,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 708.27386 ± 95.523
2025-05-13 11:34:35,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [914.7349, 665.3869, 831.09595, 595.9258, 623.7752, 628.56195, 769.55615, 689.89905, 675.02216, 688.7807]
2025-05-13 11:34:35,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 120.0, 148.0, 118.0, 115.0, 119.0, 145.0, 126.0, 123.0, 124.0]
2025-05-13 11:34:35,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 30 minutes, 28 seconds)
2025-05-13 11:39:23,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:39:26,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 731.81604 ± 100.370
2025-05-13 11:39:26,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [692.69824, 671.7334, 853.05225, 600.0672, 753.1763, 780.19727, 939.0984, 702.1666, 723.92377, 602.04694]
2025-05-13 11:39:26,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 123.0, 157.0, 128.0, 135.0, 150.0, 171.0, 133.0, 131.0, 110.0]
2025-05-13 11:39:26,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 25 minutes, 40 seconds)
2025-05-13 11:44:11,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:44:13,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 660.72327 ± 158.542
2025-05-13 11:44:13,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [340.8389, 674.07465, 592.07086, 825.5956, 643.8925, 425.8453, 768.70306, 710.4004, 835.9166, 789.8948]
2025-05-13 11:44:13,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 120.0, 105.0, 155.0, 114.0, 79.0, 158.0, 140.0, 151.0, 139.0]
2025-05-13 11:44:13,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 21 minutes, 6 seconds)
2025-05-13 11:48:55,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:48:57,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 671.91656 ± 84.342
2025-05-13 11:48:57,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [783.0006, 527.8868, 641.22345, 728.0933, 601.2538, 728.85803, 615.7488, 782.9902, 585.57, 724.54095]
2025-05-13 11:48:57,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 96.0, 116.0, 141.0, 107.0, 143.0, 111.0, 139.0, 102.0, 136.0]
2025-05-13 11:48:57,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 15 minutes, 44 seconds)
2025-05-13 11:53:45,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:53:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 659.35364 ± 105.008
2025-05-13 11:53:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [562.6504, 665.9971, 466.53958, 658.4036, 767.51056, 767.1199, 676.5421, 517.2162, 730.4134, 781.143]
2025-05-13 11:53:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 118.0, 96.0, 119.0, 136.0, 136.0, 119.0, 92.0, 132.0, 138.0]
2025-05-13 11:53:47,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 5 hours, 11 minutes, 42 seconds)
2025-05-13 11:58:31,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:58:33,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 590.40741 ± 130.804
2025-05-13 11:58:33,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [550.334, 798.2688, 695.6175, 555.8467, 540.0485, 420.69913, 487.91364, 740.4189, 403.34467, 711.5826]
2025-05-13 11:58:33,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 142.0, 146.0, 98.0, 97.0, 76.0, 87.0, 131.0, 73.0, 124.0]
2025-05-13 11:58:33,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 5 hours, 6 minutes, 36 seconds)
2025-05-13 12:03:17,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:03:20,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 688.23407 ± 133.301
2025-05-13 12:03:20,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [728.3537, 794.7496, 548.02594, 480.7862, 647.3724, 874.3142, 836.29785, 495.80597, 768.14124, 708.49365]
2025-05-13 12:03:20,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 143.0, 97.0, 90.0, 114.0, 162.0, 149.0, 94.0, 139.0, 127.0]
2025-05-13 12:03:20,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 5 hours, 1 minute, 2 seconds)
2025-05-13 12:08:04,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:08:06,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 684.16718 ± 104.970
2025-05-13 12:08:06,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [753.7954, 804.36414, 699.58923, 616.7971, 874.1721, 598.03735, 524.0566, 571.9269, 743.5848, 655.3481]
2025-05-13 12:08:06,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 142.0, 127.0, 119.0, 156.0, 110.0, 93.0, 104.0, 130.0, 121.0]
2025-05-13 12:08:06,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 56 minutes, 11 seconds)
2025-05-13 12:12:53,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:12:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 766.48273 ± 118.500
2025-05-13 12:12:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [934.3847, 811.7999, 880.54224, 834.30237, 540.11237, 761.01605, 773.09045, 819.2729, 737.8476, 572.45856]
2025-05-13 12:12:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 144.0, 162.0, 147.0, 96.0, 134.0, 140.0, 145.0, 143.0, 117.0]
2025-05-13 12:12:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (766.48) for latency MM1Queue_a033_s075
2025-05-13 12:12:55,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 52 minutes, 23 seconds)
2025-05-13 12:17:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:17:43,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 656.10046 ± 129.000
2025-05-13 12:17:43,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [681.99335, 637.6335, 740.84766, 715.4215, 729.33844, 550.5334, 531.99945, 871.6337, 386.24615, 715.35803]
2025-05-13 12:17:43,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 111.0, 138.0, 126.0, 140.0, 101.0, 95.0, 159.0, 71.0, 130.0]
2025-05-13 12:17:43,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 47 minutes, 8 seconds)
2025-05-13 12:22:27,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:22:29,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 741.96545 ± 98.915
2025-05-13 12:22:29,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [705.154, 824.8617, 800.1573, 596.93695, 761.2983, 874.10284, 671.62573, 603.7173, 694.69086, 887.10974]
2025-05-13 12:22:29,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 144.0, 145.0, 106.0, 142.0, 160.0, 123.0, 108.0, 145.0, 156.0]
2025-05-13 12:22:29,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 42 minutes, 32 seconds)
2025-05-13 12:27:14,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:27:16,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 578.18677 ± 136.905
2025-05-13 12:27:16,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [458.4192, 812.47845, 835.54285, 438.5057, 556.2233, 433.99863, 557.70404, 617.4731, 588.74084, 482.7817]
2025-05-13 12:27:16,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 146.0, 148.0, 78.0, 99.0, 84.0, 99.0, 113.0, 127.0, 86.0]
2025-05-13 12:27:16,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 37 minutes, 39 seconds)
2025-05-13 12:32:03,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:32:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 691.44965 ± 130.083
2025-05-13 12:32:05,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [897.6248, 902.72217, 685.85876, 815.1159, 659.07214, 489.61847, 645.1283, 586.73535, 599.5702, 633.0505]
2025-05-13 12:32:05,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 172.0, 132.0, 143.0, 129.0, 86.0, 115.0, 104.0, 108.0, 116.0]
2025-05-13 12:32:05,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 33 minutes, 26 seconds)
2025-05-13 12:36:47,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:36:50,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 733.31506 ± 105.592
2025-05-13 12:36:50,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [911.85266, 778.1892, 801.68524, 791.5839, 614.42017, 804.786, 551.577, 605.6153, 758.3265, 715.11414]
2025-05-13 12:36:50,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 141.0, 141.0, 141.0, 110.0, 143.0, 98.0, 107.0, 137.0, 126.0]
2025-05-13 12:36:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 27 minutes, 42 seconds)
2025-05-13 12:41:35,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:41:38,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 745.32782 ± 102.469
2025-05-13 12:41:38,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [581.2942, 715.184, 800.943, 752.7714, 744.002, 706.1168, 704.545, 953.2764, 865.0016, 630.1442]
2025-05-13 12:41:38,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 134.0, 149.0, 145.0, 138.0, 135.0, 132.0, 178.0, 163.0, 126.0]
2025-05-13 12:41:38,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 23 minutes, 6 seconds)
2025-05-13 12:46:21,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:46:24,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 761.28845 ± 141.686
2025-05-13 12:46:24,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [793.8098, 702.68536, 769.73517, 1035.8203, 866.27124, 760.0398, 628.3489, 787.865, 806.0394, 462.27]
2025-05-13 12:46:24,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 127.0, 142.0, 200.0, 153.0, 134.0, 128.0, 140.0, 156.0, 82.0]
2025-05-13 12:46:24,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 18 minutes, 12 seconds)
2025-05-13 12:51:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:51:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 750.05786 ± 164.250
2025-05-13 12:51:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [912.70715, 473.14667, 760.073, 728.3686, 634.4203, 703.8503, 925.79083, 523.3325, 845.62274, 993.2666]
2025-05-13 12:51:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 83.0, 137.0, 132.0, 116.0, 124.0, 172.0, 99.0, 158.0, 181.0]
2025-05-13 12:51:10,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 4 hours, 13 minutes, 22 seconds)
2025-05-13 12:55:58,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:56:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 720.50757 ± 137.683
2025-05-13 12:56:01,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [552.1478, 762.32245, 742.0192, 574.7378, 491.234, 877.0394, 690.03705, 947.98505, 781.33075, 786.22284]
2025-05-13 12:56:01,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 133.0, 133.0, 100.0, 87.0, 155.0, 135.0, 174.0, 140.0, 138.0]
2025-05-13 12:56:01,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 4 hours, 8 minutes, 45 seconds)
2025-05-13 13:00:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:00:45,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 795.47437 ± 158.045
2025-05-13 13:00:45,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [947.98193, 897.243, 693.9936, 391.8493, 776.4232, 908.7119, 851.9242, 749.40607, 951.56024, 785.65045]
2025-05-13 13:00:45,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 159.0, 125.0, 72.0, 139.0, 163.0, 160.0, 142.0, 181.0, 137.0]
2025-05-13 13:00:45,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (795.47) for latency MM1Queue_a033_s075
2025-05-13 13:00:45,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 4 hours, 3 minutes, 57 seconds)
2025-05-13 13:05:31,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:05:33,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 715.61499 ± 114.958
2025-05-13 13:05:33,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [539.1411, 695.6152, 683.23145, 928.2486, 645.8039, 695.28937, 776.7636, 586.66974, 720.4447, 884.94226]
2025-05-13 13:05:33,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 121.0, 118.0, 166.0, 118.0, 127.0, 145.0, 104.0, 130.0, 158.0]
2025-05-13 13:05:33,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 59 minutes, 10 seconds)
2025-05-13 13:10:17,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:10:20,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 898.63281 ± 160.573
2025-05-13 13:10:20,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [971.21906, 1058.4468, 572.8704, 722.66406, 1106.5686, 810.0561, 885.13586, 997.0712, 815.2341, 1047.0618]
2025-05-13 13:10:20,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 191.0, 100.0, 126.0, 213.0, 143.0, 156.0, 180.0, 149.0, 190.0]
2025-05-13 13:10:20,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (898.63) for latency MM1Queue_a033_s075
2025-05-13 13:10:20,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 54 minutes, 33 seconds)
2025-05-13 13:15:03,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:15:06,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 800.17615 ± 111.759
2025-05-13 13:15:06,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [669.5654, 1040.3145, 796.05237, 648.9007, 839.34845, 858.6511, 743.41425, 690.4922, 841.7961, 873.22656]
2025-05-13 13:15:06,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 192.0, 142.0, 116.0, 160.0, 151.0, 155.0, 121.0, 149.0, 159.0]
2025-05-13 13:15:06,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 49 minutes, 50 seconds)
2025-05-13 13:19:52,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:19:55,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 757.49817 ± 174.555
2025-05-13 13:19:55,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [950.0262, 731.4271, 554.5205, 974.3909, 793.32306, 647.1492, 1058.4327, 524.4505, 722.01624, 619.24524]
2025-05-13 13:19:55,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 136.0, 98.0, 180.0, 142.0, 123.0, 213.0, 92.0, 127.0, 109.0]
2025-05-13 13:19:55,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 44 minutes, 39 seconds)
2025-05-13 13:24:37,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:24:40,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 809.40552 ± 147.018
2025-05-13 13:24:40,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [540.3196, 709.4129, 607.57153, 973.18054, 734.2906, 920.52454, 979.4242, 915.0423, 899.3634, 814.92584]
2025-05-13 13:24:40,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 138.0, 106.0, 175.0, 132.0, 175.0, 180.0, 164.0, 160.0, 145.0]
2025-05-13 13:24:40,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 40 minutes, 1 second)
2025-05-13 13:29:24,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:29:26,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 627.07385 ± 219.315
2025-05-13 13:29:26,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [674.4445, 887.827, 220.34317, 561.8337, 549.04456, 317.2064, 760.6884, 756.64606, 594.3136, 948.3913]
2025-05-13 13:29:26,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 159.0, 42.0, 112.0, 110.0, 59.0, 133.0, 132.0, 107.0, 176.0]
2025-05-13 13:29:26,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 34 minutes, 56 seconds)
2025-05-13 13:34:13,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:34:16,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 769.94287 ± 83.276
2025-05-13 13:34:16,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [669.42505, 853.27545, 863.7254, 794.91754, 651.6419, 848.3223, 627.83264, 800.4307, 808.521, 781.33704]
2025-05-13 13:34:16,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 161.0, 161.0, 140.0, 120.0, 151.0, 113.0, 144.0, 159.0, 141.0]
2025-05-13 13:34:16,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 30 minutes, 34 seconds)
2025-05-13 13:38:58,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:39:01,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 764.15442 ± 171.303
2025-05-13 13:39:01,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [614.1619, 772.3974, 562.19183, 966.13544, 942.78375, 680.1555, 648.3151, 531.05664, 1007.75946, 916.5868]
2025-05-13 13:39:01,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 137.0, 102.0, 177.0, 169.0, 123.0, 115.0, 95.0, 196.0, 166.0]
2025-05-13 13:39:01,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 25 minutes, 34 seconds)
2025-05-13 13:43:46,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:43:49,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 768.96063 ± 181.378
2025-05-13 13:43:49,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [897.5997, 477.9617, 805.02136, 498.66922, 858.5659, 679.677, 598.46967, 947.13135, 940.004, 986.5068]
2025-05-13 13:43:49,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 87.0, 142.0, 88.0, 157.0, 122.0, 111.0, 168.0, 169.0, 176.0]
2025-05-13 13:43:49,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 20 minutes, 46 seconds)
2025-05-13 13:48:32,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:48:35,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 770.58752 ± 267.151
2025-05-13 13:48:35,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [642.3826, 1198.324, 926.9748, 576.96234, 349.54678, 652.84015, 424.19012, 973.2209, 1057.7006, 903.7334]
2025-05-13 13:48:35,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 232.0, 167.0, 108.0, 66.0, 117.0, 78.0, 174.0, 212.0, 188.0]
2025-05-13 13:48:35,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 16 minutes, 11 seconds)
2025-05-13 13:53:20,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:53:23,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 793.03345 ± 110.208
2025-05-13 13:53:23,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [686.9672, 723.03345, 799.4393, 607.2457, 903.69525, 876.47687, 886.85205, 968.54456, 683.79193, 794.2878]
2025-05-13 13:53:23,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 130.0, 141.0, 109.0, 175.0, 157.0, 163.0, 173.0, 118.0, 139.0]
2025-05-13 13:53:23,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 11 minutes, 36 seconds)
2025-05-13 13:58:10,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:58:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 776.49561 ± 129.863
2025-05-13 13:58:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [722.2859, 799.3139, 725.4995, 587.96484, 1024.4377, 687.05444, 892.4194, 605.16565, 878.9774, 841.837]
2025-05-13 13:58:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 142.0, 140.0, 108.0, 189.0, 136.0, 159.0, 106.0, 158.0, 162.0]
2025-05-13 13:58:13,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 3 hours, 6 minutes, 47 seconds)
2025-05-13 14:02:57,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:03:00,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 763.94220 ± 105.999
2025-05-13 14:03:00,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [764.8609, 751.4877, 757.46124, 895.5723, 636.6927, 944.8219, 810.36145, 635.4338, 830.0608, 612.6688]
2025-05-13 14:03:00,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 133.0, 137.0, 160.0, 113.0, 168.0, 146.0, 126.0, 145.0, 109.0]
2025-05-13 14:03:00,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 3 hours, 2 minutes, 20 seconds)
2025-05-13 14:07:42,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:07:45,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 800.65802 ± 93.157
2025-05-13 14:07:45,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [855.6886, 866.5241, 701.24603, 867.2947, 787.55505, 909.7709, 704.1213, 670.5348, 708.21466, 935.6298]
2025-05-13 14:07:45,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 158.0, 126.0, 158.0, 140.0, 172.0, 126.0, 124.0, 124.0, 174.0]
2025-05-13 14:07:45,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 57 minutes, 8 seconds)
2025-05-13 14:12:32,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:12:34,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 730.86395 ± 162.708
2025-05-13 14:12:34,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [993.62555, 634.5925, 613.6621, 921.4766, 713.16486, 689.34814, 382.63474, 824.63403, 739.9748, 795.5266]
2025-05-13 14:12:34,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 120.0, 107.0, 163.0, 146.0, 139.0, 81.0, 143.0, 129.0, 141.0]
2025-05-13 14:12:34,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 52 minutes, 43 seconds)
2025-05-13 14:17:17,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:17:20,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 842.22168 ± 283.357
2025-05-13 14:17:20,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [787.4506, 355.75845, 848.072, 917.2278, 843.4233, 411.44885, 1087.2661, 911.28204, 864.22565, 1396.0624]
2025-05-13 14:17:20,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 65.0, 149.0, 167.0, 154.0, 74.0, 205.0, 160.0, 168.0, 271.0]
2025-05-13 14:17:20,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 47 minutes, 38 seconds)
2025-05-13 14:22:04,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:22:08,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 909.03918 ± 137.802
2025-05-13 14:22:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [770.3086, 855.46106, 1081.8265, 1082.2395, 1066.0156, 982.315, 875.6777, 697.8248, 945.37787, 733.34607]
2025-05-13 14:22:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 161.0, 200.0, 199.0, 198.0, 179.0, 156.0, 122.0, 171.0, 131.0]
2025-05-13 14:22:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (909.04) for latency MM1Queue_a033_s075
2025-05-13 14:22:08,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 42 minutes, 39 seconds)
2025-05-13 14:26:51,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:26:53,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 752.66321 ± 170.602
2025-05-13 14:26:53,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [747.43384, 666.4959, 691.3229, 688.1901, 729.339, 818.28455, 371.8097, 996.1675, 1002.9479, 814.64087]
2025-05-13 14:26:53,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 120.0, 127.0, 128.0, 129.0, 150.0, 67.0, 178.0, 182.0, 152.0]
2025-05-13 14:26:53,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 37 minutes, 38 seconds)
2025-05-13 14:31:39,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:31:42,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 727.85120 ± 176.076
2025-05-13 14:31:42,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [803.2425, 815.7533, 956.3732, 565.72003, 869.1317, 896.30707, 455.15625, 643.1427, 447.54242, 826.142]
2025-05-13 14:31:42,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 148.0, 171.0, 103.0, 163.0, 160.0, 80.0, 116.0, 80.0, 149.0]
2025-05-13 14:31:42,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 33 minutes, 14 seconds)
2025-05-13 14:36:28,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:36:31,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 862.18781 ± 151.003
2025-05-13 14:36:31,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [995.5489, 924.6074, 857.50244, 811.5518, 1054.6736, 655.9832, 961.75757, 969.69666, 848.55927, 541.99774]
2025-05-13 14:36:31,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 165.0, 152.0, 155.0, 190.0, 116.0, 171.0, 175.0, 152.0, 105.0]
2025-05-13 14:36:31,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 28 minutes, 24 seconds)
2025-05-13 14:41:13,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:41:16,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 751.74329 ± 169.339
2025-05-13 14:41:16,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [404.9379, 750.0839, 849.0076, 709.09753, 720.611, 1125.8116, 653.1492, 815.42163, 759.28296, 730.02985]
2025-05-13 14:41:16,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 133.0, 168.0, 127.0, 144.0, 207.0, 121.0, 150.0, 158.0, 138.0]
2025-05-13 14:41:16,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 23 minutes, 34 seconds)
2025-05-13 14:45:59,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:46:01,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 679.40118 ± 188.853
2025-05-13 14:46:01,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [613.52655, 1056.2482, 523.096, 426.20932, 581.10364, 890.10065, 634.28955, 563.81885, 608.7009, 896.91785]
2025-05-13 14:46:01,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 213.0, 100.0, 78.0, 106.0, 158.0, 111.0, 99.0, 107.0, 164.0]
2025-05-13 14:46:01,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 18 minutes, 33 seconds)
2025-05-13 14:50:46,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:50:49,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 851.52411 ± 119.907
2025-05-13 14:50:49,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [941.4441, 872.8428, 558.68567, 1003.54626, 952.82336, 889.22656, 883.28595, 743.99585, 862.5539, 806.8366]
2025-05-13 14:50:49,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 155.0, 100.0, 204.0, 177.0, 157.0, 160.0, 133.0, 152.0, 148.0]
2025-05-13 14:50:49,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 14 minutes, 2 seconds)
2025-05-13 14:55:37,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:55:40,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 808.30994 ± 137.716
2025-05-13 14:55:40,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [868.25854, 637.6369, 953.1006, 771.1871, 877.7825, 693.95276, 981.3322, 943.6079, 809.54767, 546.6934]
2025-05-13 14:55:40,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 115.0, 168.0, 143.0, 154.0, 142.0, 177.0, 171.0, 143.0, 98.0]
2025-05-13 14:55:40,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 9 minutes, 26 seconds)
2025-05-13 15:00:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:00:28,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 850.12372 ± 162.779
2025-05-13 15:00:28,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [702.1986, 795.70734, 933.85876, 1220.2831, 673.0077, 717.75464, 869.6835, 1022.4909, 719.34424, 846.90906]
2025-05-13 15:00:28,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 138.0, 165.0, 240.0, 124.0, 129.0, 152.0, 183.0, 127.0, 148.0]
2025-05-13 15:00:28,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 4 minutes, 33 seconds)
2025-05-13 15:05:13,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:05:16,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 824.14178 ± 238.015
2025-05-13 15:05:16,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [859.96484, 934.8067, 903.4168, 140.64113, 1031.6985, 835.7901, 957.6394, 883.1068, 763.6659, 930.6872]
2025-05-13 15:05:16,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 168.0, 159.0, 27.0, 192.0, 156.0, 174.0, 163.0, 143.0, 166.0]
2025-05-13 15:05:16,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 2 hours, 1 second)
2025-05-13 15:10:02,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:10:05,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 824.43811 ± 160.269
2025-05-13 15:10:05,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [958.22943, 1012.7154, 816.90106, 563.2343, 1035.1934, 839.76556, 537.0462, 830.18756, 888.5842, 762.52295]
2025-05-13 15:10:05,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 186.0, 164.0, 114.0, 187.0, 167.0, 107.0, 146.0, 163.0, 145.0]
2025-05-13 15:10:05,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 55 minutes, 32 seconds)
2025-05-13 15:14:52,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:14:55,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 819.60986 ± 124.631
2025-05-13 15:14:55,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [805.9153, 653.6218, 814.4636, 878.4337, 913.6333, 856.0108, 863.14435, 1061.9083, 749.3538, 599.6132]
2025-05-13 15:14:55,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 117.0, 146.0, 154.0, 168.0, 152.0, 153.0, 192.0, 136.0, 126.0]
2025-05-13 15:14:55,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 50 minutes, 48 seconds)
2025-05-13 15:19:38,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:19:41,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 800.92706 ± 136.190
2025-05-13 15:19:41,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [731.80383, 1031.4049, 650.32745, 848.67065, 686.284, 852.83777, 549.9379, 854.23096, 885.16565, 918.60754]
2025-05-13 15:19:41,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 184.0, 117.0, 149.0, 120.0, 152.0, 101.0, 150.0, 155.0, 163.0]
2025-05-13 15:19:41,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 45 minutes, 41 seconds)
2025-05-13 15:24:30,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:24:33,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 860.69971 ± 80.503
2025-05-13 15:24:33,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [802.00586, 908.6782, 841.3709, 944.8074, 853.43243, 678.27985, 804.1688, 916.00854, 892.62445, 965.62103]
2025-05-13 15:24:33,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 161.0, 156.0, 167.0, 151.0, 121.0, 153.0, 166.0, 162.0, 174.0]
2025-05-13 15:24:33,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 41 minutes, 10 seconds)
2025-05-13 15:29:16,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:29:19,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 827.60663 ± 182.634
2025-05-13 15:29:19,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [931.24536, 690.2547, 1010.4956, 928.7191, 966.7424, 561.12335, 454.74817, 949.06824, 954.5801, 829.0893]
2025-05-13 15:29:19,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 126.0, 184.0, 188.0, 188.0, 119.0, 100.0, 168.0, 181.0, 160.0]
2025-05-13 15:29:19,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 36 minutes, 13 seconds)
2025-05-13 15:34:02,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:34:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 755.05725 ± 122.086
2025-05-13 15:34:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [738.3996, 796.86884, 933.87085, 829.0753, 496.1773, 665.09827, 860.3283, 779.3567, 828.3229, 623.074]
2025-05-13 15:34:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 142.0, 165.0, 149.0, 89.0, 120.0, 153.0, 137.0, 145.0, 110.0]
2025-05-13 15:34:04,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 31 minutes, 8 seconds)
2025-05-13 15:38:54,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:38:57,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 793.13208 ± 179.243
2025-05-13 15:38:57,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [863.19226, 728.78143, 356.86102, 973.70685, 795.121, 971.0332, 969.80945, 822.3353, 815.6666, 634.8138]
2025-05-13 15:38:57,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 141.0, 67.0, 181.0, 144.0, 180.0, 184.0, 150.0, 159.0, 119.0]
2025-05-13 15:38:57,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 26 minutes, 31 seconds)
2025-05-13 15:43:40,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:43:44,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 841.61755 ± 121.962
2025-05-13 15:43:44,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [751.6823, 1025.3405, 872.58984, 888.9476, 739.74786, 894.95514, 727.94244, 621.0841, 1004.98926, 888.8965]
2025-05-13 15:43:44,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 195.0, 155.0, 165.0, 138.0, 163.0, 146.0, 127.0, 177.0, 165.0]
2025-05-13 15:43:44,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 21 minutes, 44 seconds)
2025-05-13 15:48:26,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:48:29,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 876.19678 ± 50.585
2025-05-13 15:48:29,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [924.429, 888.99866, 921.77155, 810.52155, 941.1181, 919.13104, 804.436, 806.19135, 889.63635, 855.73456]
2025-05-13 15:48:29,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 156.0, 169.0, 144.0, 168.0, 165.0, 155.0, 144.0, 158.0, 154.0]
2025-05-13 15:48:29,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 16 minutes, 34 seconds)
2025-05-13 15:53:10,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:53:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 944.56458 ± 92.788
2025-05-13 15:53:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [797.9624, 979.79755, 954.3547, 937.62463, 936.32825, 781.86365, 1101.4385, 1004.3544, 1033.0723, 918.8486]
2025-05-13 15:53:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 177.0, 170.0, 170.0, 167.0, 135.0, 226.0, 184.0, 191.0, 164.0]
2025-05-13 15:53:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (944.56) for latency MM1Queue_a033_s075
2025-05-13 15:53:13,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 11 minutes, 41 seconds)
2025-05-13 15:57:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:58:00,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 791.04694 ± 167.395
2025-05-13 15:58:00,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [860.6946, 828.51227, 683.2459, 617.45667, 588.2778, 894.2328, 912.0962, 951.7057, 1051.6453, 522.6019]
2025-05-13 15:58:00,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 146.0, 141.0, 109.0, 123.0, 162.0, 164.0, 190.0, 194.0, 104.0]
2025-05-13 15:58:00,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 7 minutes)
2025-05-13 16:02:46,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:02:49,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 792.25079 ± 168.292
2025-05-13 16:02:49,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [933.0981, 403.3401, 940.48694, 745.0108, 984.9895, 804.5115, 761.8664, 601.66156, 867.0852, 880.4581]
2025-05-13 16:02:49,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 75.0, 167.0, 133.0, 180.0, 141.0, 134.0, 112.0, 161.0, 172.0]
2025-05-13 16:02:49,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 2 minutes, 5 seconds)
2025-05-13 16:07:29,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:07:31,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 802.38135 ± 107.586
2025-05-13 16:07:31,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [890.5679, 784.19434, 849.5281, 882.65375, 889.4332, 556.8485, 879.538, 687.6974, 879.14465, 724.20825]
2025-05-13 16:07:31,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 137.0, 155.0, 157.0, 158.0, 98.0, 154.0, 123.0, 154.0, 127.0]
2025-05-13 16:07:31,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 57 minutes, 6 seconds)
2025-05-13 16:12:08,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:12:10,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 783.25568 ± 192.250
2025-05-13 16:12:10,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [788.8872, 783.20294, 747.4529, 375.4853, 923.87024, 915.8601, 939.5521, 862.10236, 1004.52136, 491.62222]
2025-05-13 16:12:10,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 136.0, 131.0, 68.0, 164.0, 163.0, 171.0, 150.0, 183.0, 88.0]
2025-05-13 16:12:10,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 52 minutes, 7 seconds)
2025-05-13 16:16:45,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:16:48,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 829.27832 ± 261.142
2025-05-13 16:16:48,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [533.04095, 771.6243, 1448.5719, 790.4266, 465.8774, 635.1932, 835.0244, 959.056, 935.3897, 918.5795]
2025-05-13 16:16:48,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 139.0, 300.0, 140.0, 83.0, 131.0, 155.0, 177.0, 170.0, 169.0]
2025-05-13 16:16:48,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 47 minutes, 9 seconds)
2025-05-13 16:21:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:21:28,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 864.51495 ± 235.737
2025-05-13 16:21:28,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [869.80707, 784.5228, 985.81244, 551.85205, 1037.9897, 465.56702, 1070.7445, 1226.2426, 1015.08167, 637.52936]
2025-05-13 16:21:28,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 138.0, 175.0, 97.0, 185.0, 82.0, 193.0, 234.0, 184.0, 113.0]
2025-05-13 16:21:28,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 42 minutes, 14 seconds)
2025-05-13 16:26:05,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:26:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 837.49915 ± 218.821
2025-05-13 16:26:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [480.746, 989.9738, 1297.216, 861.83075, 824.6537, 871.8735, 897.83014, 517.76013, 880.88513, 752.2221]
2025-05-13 16:26:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 179.0, 248.0, 154.0, 155.0, 155.0, 159.0, 91.0, 159.0, 132.0]
2025-05-13 16:26:08,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 37 minutes, 17 seconds)
2025-05-13 16:31:02,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:31:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 846.83606 ± 185.343
2025-05-13 16:31:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [831.04834, 994.8674, 994.32367, 1041.3916, 673.8044, 1035.0637, 915.48895, 900.7577, 513.83777, 567.7765]
2025-05-13 16:31:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 180.0, 177.0, 189.0, 123.0, 186.0, 161.0, 159.0, 110.0, 104.0]
2025-05-13 16:31:05,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 32 minutes, 58 seconds)
2025-05-13 16:36:01,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:36:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 894.95331 ± 146.576
2025-05-13 16:36:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [996.787, 769.7751, 1027.1455, 924.54584, 959.53876, 503.8172, 913.4906, 977.6597, 972.58514, 904.18896]
2025-05-13 16:36:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 147.0, 187.0, 164.0, 199.0, 90.0, 162.0, 174.0, 181.0, 165.0]
2025-05-13 16:36:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 28 minutes, 41 seconds)
2025-05-13 16:41:25,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:41:28,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 893.31281 ± 100.129
2025-05-13 16:41:28,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [975.3134, 928.00635, 838.3963, 1000.92993, 947.90594, 1004.44104, 754.22864, 821.1428, 708.54065, 954.2236]
2025-05-13 16:41:28,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 166.0, 150.0, 185.0, 180.0, 188.0, 137.0, 153.0, 124.0, 168.0]
2025-05-13 16:41:28,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 24 minutes, 40 seconds)
2025-05-13 16:46:26,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:46:29,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 836.36389 ± 108.342
2025-05-13 16:46:29,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [991.0936, 893.7431, 786.4794, 972.0997, 905.74207, 798.69635, 667.03546, 903.42944, 767.2372, 678.082]
2025-05-13 16:46:29,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 163.0, 137.0, 175.0, 162.0, 148.0, 120.0, 167.0, 141.0, 120.0]
2025-05-13 16:46:29,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 20 minutes)
2025-05-13 16:51:01,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:51:05,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 905.06036 ± 156.778
2025-05-13 16:51:05,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [960.14746, 868.93665, 1071.6326, 949.10785, 840.2597, 488.32532, 1006.02106, 1049.4554, 870.6979, 946.01935]
2025-05-13 16:51:05,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 158.0, 197.0, 170.0, 148.0, 93.0, 180.0, 195.0, 158.0, 172.0]
2025-05-13 16:51:05,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 14 minutes, 58 seconds)
2025-05-13 16:55:41,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:55:44,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 931.96497 ± 80.009
2025-05-13 16:55:44,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [957.3664, 905.4364, 921.2608, 844.2472, 1072.8313, 894.0798, 829.6969, 1033.2809, 1010.8826, 850.56726]
2025-05-13 16:55:44,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 165.0, 167.0, 152.0, 220.0, 157.0, 157.0, 187.0, 181.0, 150.0]
2025-05-13 16:55:44,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 51 seconds)
2025-05-13 17:00:19,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 17:00:22,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 848.94202 ± 182.555
2025-05-13 17:00:22,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1041.4735, 1011.3112, 1134.4407, 707.59216, 788.6474, 835.65546, 838.4978, 466.33905, 926.3046, 739.15796]
2025-05-13 17:00:22,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 191.0, 215.0, 131.0, 145.0, 152.0, 147.0, 85.0, 168.0, 130.0]
2025-05-13 17:00:22,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 51 seconds)
2025-05-13 17:04:56,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 17:04:59,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 828.01788 ± 116.758
2025-05-13 17:04:59,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [917.6299, 757.32635, 852.0118, 948.90063, 829.55237, 548.06854, 891.85114, 973.4777, 777.6426, 783.71783]
2025-05-13 17:04:59,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 136.0, 154.0, 169.0, 165.0, 100.0, 157.0, 175.0, 137.0, 152.0]
2025-05-13 17:04:59,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1251 [DEBUG]: Training session finished
