2025-05-13 09:06:37,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mda-highdim-mem4
2025-05-13 09:06:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-bpql-mda-highdim-mem4
2025-05-13 09:06:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14e11e361a50>}
2025-05-13 09:06:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:37,809 baseline-bpql-mda-noisy-humanoid:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-13 09:06:37,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-13 09:06:37,827 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-13 09:06:37,827 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:37,835 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2025-05-13 09:06:38,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:38,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:56,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:10:57,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 254.75684 ± 21.864
2025-05-13 09:10:57,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [277.89374, 270.26837, 279.41156, 248.20775, 256.18924, 272.29062, 222.43394, 223.81676, 226.56912, 270.4871]
2025-05-13 09:10:57,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 51.0, 52.0, 46.0, 48.0, 51.0, 43.0, 41.0, 43.0, 51.0]
2025-05-13 09:10:57,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (254.76) for latency MM1Queue_a033_s075
2025-05-13 09:10:57,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 7 minutes, 26 seconds)
2025-05-13 09:15:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:15:29,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 346.62628 ± 28.756
2025-05-13 09:15:29,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [338.72906, 307.59338, 310.00125, 329.67484, 378.85648, 395.81323, 363.99942, 376.37805, 335.43646, 329.78043]
2025-05-13 09:15:29,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 56.0, 57.0, 60.0, 71.0, 73.0, 67.0, 68.0, 62.0, 60.0]
2025-05-13 09:15:29,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (346.63) for latency MM1Queue_a033_s075
2025-05-13 09:15:29,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 13 minutes, 27 seconds)
2025-05-13 09:20:01,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:20:02,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 455.95425 ± 149.687
2025-05-13 09:20:02,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [299.58304, 586.28076, 443.4604, 274.95996, 457.89084, 477.67017, 378.32013, 270.13596, 725.53937, 645.70184]
2025-05-13 09:20:02,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 120.0, 98.0, 62.0, 101.0, 96.0, 84.0, 59.0, 144.0, 127.0]
2025-05-13 09:20:02,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (455.95) for latency MM1Queue_a033_s075
2025-05-13 09:20:02,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 13 minutes, 22 seconds)
2025-05-13 09:24:33,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:24:34,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 307.10434 ± 18.595
2025-05-13 09:24:34,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [323.335, 310.9243, 274.42477, 305.67706, 330.3325, 281.65662, 287.62454, 325.52228, 310.5912, 320.95517]
2025-05-13 09:24:34,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 57.0, 51.0, 56.0, 61.0, 52.0, 54.0, 60.0, 57.0, 59.0]
2025-05-13 09:24:34,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 10 minutes, 7 seconds)
2025-05-13 09:29:06,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:29:07,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 429.29263 ± 44.361
2025-05-13 09:29:07,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [464.57272, 492.07617, 398.23734, 425.23087, 427.33984, 513.7529, 395.91357, 417.1588, 387.02582, 371.61832]
2025-05-13 09:29:07,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 99.0, 73.0, 79.0, 78.0, 100.0, 73.0, 78.0, 73.0, 68.0]
2025-05-13 09:29:07,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 7 minutes, 15 seconds)
2025-05-13 09:33:40,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:33:42,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 502.95312 ± 114.370
2025-05-13 09:33:42,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [525.62744, 675.1166, 440.1242, 514.6599, 578.15955, 342.36633, 323.14243, 667.6484, 432.70944, 529.9769]
2025-05-13 09:33:42,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 140.0, 92.0, 107.0, 125.0, 63.0, 61.0, 128.0, 83.0, 99.0]
2025-05-13 09:33:42,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (502.95) for latency MM1Queue_a033_s075
2025-05-13 09:33:42,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 7 minutes, 28 seconds)
2025-05-13 09:38:13,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:38:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 402.48483 ± 21.605
2025-05-13 09:38:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [395.84308, 427.43253, 422.94476, 370.31436, 422.54803, 389.25613, 428.07224, 372.33322, 411.5964, 384.50735]
2025-05-13 09:38:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 78.0, 79.0, 79.0, 77.0, 71.0, 88.0, 70.0, 75.0, 74.0]
2025-05-13 09:38:15,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 3 minutes, 20 seconds)
2025-05-13 09:42:46,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:42:48,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 383.01324 ± 110.582
2025-05-13 09:42:48,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [420.2475, 384.36945, 348.58435, 112.19705, 578.96796, 327.0673, 399.30392, 433.28137, 402.05103, 424.06232]
2025-05-13 09:42:48,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 72.0, 64.0, 25.0, 109.0, 66.0, 73.0, 80.0, 76.0, 86.0]
2025-05-13 09:42:48,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 58 minutes, 39 seconds)
2025-05-13 09:47:21,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:47:23,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 512.30005 ± 177.672
2025-05-13 09:47:23,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [443.08414, 618.5495, 937.58484, 602.5059, 473.9124, 603.75037, 437.98315, 309.07477, 340.50635, 356.04868]
2025-05-13 09:47:23,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 113.0, 180.0, 112.0, 88.0, 112.0, 80.0, 66.0, 63.0, 66.0]
2025-05-13 09:47:23,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (512.30) for latency MM1Queue_a033_s075
2025-05-13 09:47:23,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 55 minutes, 16 seconds)
2025-05-13 09:51:54,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:51:56,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 569.85199 ± 140.715
2025-05-13 09:51:56,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [662.0106, 426.1759, 871.60657, 594.95056, 505.75116, 616.73785, 616.0804, 621.6933, 427.9318, 355.58206]
2025-05-13 09:51:56,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 78.0, 181.0, 115.0, 95.0, 121.0, 118.0, 122.0, 81.0, 68.0]
2025-05-13 09:51:56,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (569.85) for latency MM1Queue_a033_s075
2025-05-13 09:51:56,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 50 minutes, 33 seconds)
2025-05-13 09:56:28,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:56:29,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 489.54852 ± 109.557
2025-05-13 09:56:29,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [270.93158, 494.49783, 577.89594, 387.2038, 446.75848, 470.22018, 641.88275, 436.48315, 523.3385, 646.27277]
2025-05-13 09:56:29,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 90.0, 120.0, 71.0, 81.0, 84.0, 134.0, 79.0, 107.0, 121.0]
2025-05-13 09:56:29,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 45 minutes, 47 seconds)
2025-05-13 10:01:00,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:01:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 461.33643 ± 95.534
2025-05-13 10:01:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [533.0871, 532.44415, 524.0635, 489.39883, 431.14908, 386.78912, 540.77496, 527.04175, 220.1652, 428.4507]
2025-05-13 10:01:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 97.0, 96.0, 103.0, 87.0, 70.0, 99.0, 95.0, 43.0, 81.0]
2025-05-13 10:01:01,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 40 minutes, 49 seconds)
2025-05-13 10:05:33,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:05:35,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 564.45770 ± 197.186
2025-05-13 10:05:35,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [630.57263, 526.1491, 495.79916, 1027.5642, 400.26306, 799.03894, 491.4966, 317.016, 463.64145, 493.03632]
2025-05-13 10:05:35,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 99.0, 95.0, 210.0, 74.0, 156.0, 92.0, 73.0, 86.0, 93.0]
2025-05-13 10:05:35,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 36 minutes, 29 seconds)
2025-05-13 10:10:08,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:10:09,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 525.51733 ± 117.249
2025-05-13 10:10:09,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [416.11584, 479.14206, 511.18585, 608.384, 559.3544, 664.35724, 503.53976, 704.0779, 535.2888, 273.72693]
2025-05-13 10:10:09,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 88.0, 105.0, 110.0, 106.0, 122.0, 93.0, 135.0, 103.0, 56.0]
2025-05-13 10:10:09,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 31 minutes, 50 seconds)
2025-05-13 10:14:40,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:14:42,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 486.30981 ± 93.522
2025-05-13 10:14:42,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [257.2099, 617.4211, 567.867, 467.02966, 423.84372, 465.6699, 560.8667, 510.7226, 502.70526, 489.76227]
2025-05-13 10:14:42,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 113.0, 104.0, 86.0, 79.0, 85.0, 108.0, 94.0, 96.0, 89.0]
2025-05-13 10:14:42,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 26 minutes, 59 seconds)
2025-05-13 10:19:13,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:19:15,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 579.11395 ± 117.569
2025-05-13 10:19:15,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [558.85364, 565.6366, 850.5939, 575.2831, 619.79834, 702.26965, 445.7924, 539.0539, 431.27197, 502.58633]
2025-05-13 10:19:15,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 112.0, 158.0, 105.0, 115.0, 132.0, 83.0, 97.0, 83.0, 91.0]
2025-05-13 10:19:15,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (579.11) for latency MM1Queue_a033_s075
2025-05-13 10:19:15,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 22 minutes, 20 seconds)
2025-05-13 10:23:45,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:23:47,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 518.66797 ± 83.592
2025-05-13 10:23:47,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [577.92957, 535.21533, 471.86057, 577.76196, 550.36017, 585.98315, 311.78552, 599.94006, 445.6368, 530.20636]
2025-05-13 10:23:47,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 98.0, 103.0, 109.0, 105.0, 110.0, 67.0, 110.0, 97.0, 103.0]
2025-05-13 10:23:47,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 17 minutes, 47 seconds)
2025-05-13 10:28:17,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:28:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 586.88605 ± 88.090
2025-05-13 10:28:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [528.38617, 628.3721, 535.7708, 520.4988, 551.8436, 567.01855, 514.2701, 699.8259, 793.0958, 529.7788]
2025-05-13 10:28:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 118.0, 112.0, 106.0, 101.0, 119.0, 95.0, 144.0, 147.0, 101.0]
2025-05-13 10:28:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (586.89) for latency MM1Queue_a033_s075
2025-05-13 10:28:19,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 12 minutes, 50 seconds)
2025-05-13 10:32:51,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:32:53,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 523.37494 ± 151.283
2025-05-13 10:32:53,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [348.50964, 553.89185, 639.82324, 605.7273, 610.7584, 470.21832, 759.3525, 632.8094, 257.9864, 354.67252]
2025-05-13 10:32:53,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 99.0, 122.0, 109.0, 113.0, 100.0, 144.0, 115.0, 54.0, 66.0]
2025-05-13 10:32:53,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 8 minutes, 10 seconds)
2025-05-13 10:37:26,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:37:28,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 535.22247 ± 89.239
2025-05-13 10:37:28,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [347.80594, 503.30692, 495.6951, 438.76678, 598.94257, 626.33875, 557.7434, 600.8152, 659.01196, 523.7977]
2025-05-13 10:37:28,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 101.0, 95.0, 80.0, 111.0, 114.0, 115.0, 108.0, 119.0, 99.0]
2025-05-13 10:37:28,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 4 minutes, 14 seconds)
2025-05-13 10:41:56,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:41:58,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 601.45526 ± 78.348
2025-05-13 10:41:58,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [581.42255, 586.99585, 608.82764, 559.5093, 729.5514, 611.9671, 638.1861, 435.72745, 554.18286, 708.1824]
2025-05-13 10:41:58,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 106.0, 109.0, 101.0, 127.0, 111.0, 117.0, 79.0, 102.0, 129.0]
2025-05-13 10:41:58,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (601.46) for latency MM1Queue_a033_s075
2025-05-13 10:41:58,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 58 minutes, 49 seconds)
2025-05-13 10:46:30,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:46:31,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 541.40619 ± 92.821
2025-05-13 10:46:31,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [526.4951, 598.9882, 478.3984, 577.19415, 761.12976, 490.16873, 545.45483, 381.9365, 509.67343, 544.6231]
2025-05-13 10:46:31,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 111.0, 86.0, 103.0, 149.0, 90.0, 98.0, 72.0, 92.0, 97.0]
2025-05-13 10:46:31,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 54 minutes, 52 seconds)
2025-05-13 10:51:04,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:51:06,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 589.85510 ± 102.039
2025-05-13 10:51:06,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [534.22046, 526.3247, 868.4874, 530.49207, 516.6466, 570.6844, 573.54895, 661.8651, 521.8167, 594.4648]
2025-05-13 10:51:06,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 97.0, 151.0, 98.0, 93.0, 102.0, 106.0, 121.0, 95.0, 107.0]
2025-05-13 10:51:06,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 50 minutes, 52 seconds)
2025-05-13 10:55:35,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:55:37,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 585.65283 ± 68.074
2025-05-13 10:55:37,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [664.3599, 665.70667, 472.4698, 491.4821, 600.2803, 621.99335, 646.92267, 528.981, 624.2014, 540.13074]
2025-05-13 10:55:37,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 122.0, 97.0, 90.0, 111.0, 113.0, 117.0, 97.0, 115.0, 99.0]
2025-05-13 10:55:37,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 45 minutes, 34 seconds)
2025-05-13 11:00:09,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:00:10,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 566.86810 ± 66.332
2025-05-13 11:00:10,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [592.0655, 522.3666, 576.6842, 425.5315, 583.27985, 495.61438, 678.0634, 604.78577, 578.354, 611.93555]
2025-05-13 11:00:10,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 95.0, 104.0, 84.0, 108.0, 90.0, 119.0, 110.0, 105.0, 112.0]
2025-05-13 11:00:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 40 minutes, 40 seconds)
2025-05-13 11:04:44,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:04:45,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 562.23572 ± 39.549
2025-05-13 11:04:45,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [553.57245, 519.3788, 603.26196, 579.65515, 532.28265, 496.34134, 552.2708, 549.9992, 611.4728, 624.12244]
2025-05-13 11:04:45,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 93.0, 112.0, 109.0, 100.0, 91.0, 104.0, 101.0, 118.0, 123.0]
2025-05-13 11:04:45,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 37 minutes, 24 seconds)
2025-05-13 11:09:15,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:09:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 634.24017 ± 131.207
2025-05-13 11:09:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [457.3302, 895.2867, 679.0093, 657.2071, 396.60193, 714.77264, 587.69543, 662.1403, 601.508, 690.8506]
2025-05-13 11:09:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 158.0, 128.0, 120.0, 72.0, 129.0, 113.0, 121.0, 112.0, 132.0]
2025-05-13 11:09:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (634.24) for latency MM1Queue_a033_s075
2025-05-13 11:09:17,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 32 minutes, 10 seconds)
2025-05-13 11:13:50,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:13:52,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 581.53601 ± 100.150
2025-05-13 11:13:52,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [501.9177, 664.9382, 508.668, 435.26492, 539.44666, 515.22894, 790.8598, 681.50116, 599.19336, 578.342]
2025-05-13 11:13:52,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 122.0, 97.0, 92.0, 98.0, 94.0, 145.0, 137.0, 113.0, 112.0]
2025-05-13 11:13:52,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 27 minutes, 52 seconds)
2025-05-13 11:18:22,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:18:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 548.45282 ± 94.637
2025-05-13 11:18:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [698.59216, 556.099, 371.86877, 520.0587, 535.11993, 415.06754, 654.8256, 613.9725, 535.45026, 583.47363]
2025-05-13 11:18:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 100.0, 83.0, 103.0, 110.0, 77.0, 120.0, 118.0, 109.0, 105.0]
2025-05-13 11:18:24,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 23 minutes, 31 seconds)
2025-05-13 11:23:00,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:23:02,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 624.07483 ± 122.959
2025-05-13 11:23:02,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [656.4432, 577.2499, 658.3639, 335.69778, 601.0944, 527.82935, 615.3309, 726.6517, 767.5946, 774.49207]
2025-05-13 11:23:02,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 106.0, 132.0, 62.0, 108.0, 95.0, 115.0, 130.0, 141.0, 140.0]
2025-05-13 11:23:02,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 19 minutes, 59 seconds)
2025-05-13 11:27:31,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:27:33,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 739.31439 ± 134.435
2025-05-13 11:27:33,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [804.1514, 785.38416, 819.08167, 496.52347, 712.0353, 549.34625, 707.7832, 733.5548, 1002.67255, 782.61115]
2025-05-13 11:27:33,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 144.0, 153.0, 107.0, 132.0, 115.0, 132.0, 135.0, 181.0, 145.0]
2025-05-13 11:27:33,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (739.31) for latency MM1Queue_a033_s075
2025-05-13 11:27:33,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 14 minutes, 38 seconds)
2025-05-13 11:32:06,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:32:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 653.53564 ± 110.283
2025-05-13 11:32:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [490.06134, 657.4983, 616.98914, 598.7319, 521.0418, 652.59796, 627.2524, 835.27, 848.84686, 687.0676]
2025-05-13 11:32:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 122.0, 112.0, 108.0, 99.0, 122.0, 121.0, 147.0, 150.0, 125.0]
2025-05-13 11:32:08,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 10 minutes, 47 seconds)
2025-05-13 11:36:41,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:36:43,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 718.42078 ± 233.952
2025-05-13 11:36:43,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [201.16367, 794.296, 1168.938, 534.30835, 688.2639, 670.73505, 759.1857, 807.5164, 688.1971, 871.60333]
2025-05-13 11:36:43,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 146.0, 216.0, 106.0, 126.0, 133.0, 147.0, 150.0, 124.0, 152.0]
2025-05-13 11:36:43,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 6 minutes, 16 seconds)
2025-05-13 11:41:14,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:41:16,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 609.04089 ± 107.510
2025-05-13 11:41:16,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [515.5934, 526.15314, 574.022, 528.49036, 904.70447, 614.3414, 581.25183, 591.85895, 587.56354, 666.4304]
2025-05-13 11:41:16,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 93.0, 104.0, 94.0, 166.0, 115.0, 104.0, 109.0, 116.0, 128.0]
2025-05-13 11:41:16,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 1 minute, 47 seconds)
2025-05-13 11:45:47,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:45:48,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 586.43884 ± 93.211
2025-05-13 11:45:48,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [597.50323, 534.8311, 473.66083, 525.85266, 644.13055, 413.70526, 711.7087, 641.88824, 709.48724, 611.62115]
2025-05-13 11:45:48,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 102.0, 90.0, 108.0, 117.0, 76.0, 133.0, 121.0, 131.0, 121.0]
2025-05-13 11:45:48,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 56 minutes, 8 seconds)
2025-05-13 11:50:20,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:50:22,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 755.05042 ± 138.365
2025-05-13 11:50:22,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [689.4948, 753.10455, 787.9306, 824.6509, 577.75104, 1103.806, 698.96735, 765.9122, 597.31604, 751.57056]
2025-05-13 11:50:22,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 156.0, 148.0, 153.0, 110.0, 219.0, 128.0, 144.0, 108.0, 136.0]
2025-05-13 11:50:22,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (755.05) for latency MM1Queue_a033_s075
2025-05-13 11:50:22,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 52 minutes, 3 seconds)
2025-05-13 11:54:55,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:54:57,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 635.73175 ± 114.383
2025-05-13 11:54:57,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [776.3311, 740.18054, 751.0661, 619.7872, 669.0597, 631.0381, 439.98535, 666.07916, 642.94775, 420.8425]
2025-05-13 11:54:57,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 138.0, 137.0, 115.0, 119.0, 115.0, 86.0, 122.0, 128.0, 81.0]
2025-05-13 11:54:57,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 47 minutes, 35 seconds)
2025-05-13 11:59:29,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:59:31,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 739.18164 ± 87.769
2025-05-13 11:59:31,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [703.9269, 651.6728, 778.84, 820.1678, 691.97485, 743.3797, 661.9889, 807.5487, 917.26697, 615.04974]
2025-05-13 11:59:31,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 114.0, 150.0, 152.0, 127.0, 134.0, 121.0, 158.0, 190.0, 112.0]
2025-05-13 11:59:31,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 42 minutes, 37 seconds)
2025-05-13 12:04:01,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:04:03,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 640.74573 ± 101.343
2025-05-13 12:04:03,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [471.69882, 637.3937, 623.86993, 807.8619, 703.1407, 573.01666, 731.5146, 698.9758, 484.04443, 675.9404]
2025-05-13 12:04:03,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 111.0, 117.0, 144.0, 134.0, 101.0, 141.0, 126.0, 105.0, 122.0]
2025-05-13 12:04:03,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 37 minutes, 55 seconds)
2025-05-13 12:08:35,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:08:37,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 783.68341 ± 190.422
2025-05-13 12:08:37,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [816.95514, 1124.8463, 845.7694, 486.58102, 656.5846, 748.2505, 1099.8606, 723.89343, 617.9519, 716.14044]
2025-05-13 12:08:37,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 196.0, 153.0, 92.0, 120.0, 154.0, 190.0, 145.0, 113.0, 145.0]
2025-05-13 12:08:37,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (783.68) for latency MM1Queue_a033_s075
2025-05-13 12:08:37,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 33 minutes, 44 seconds)
2025-05-13 12:13:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:13:11,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 823.00848 ± 182.490
2025-05-13 12:13:11,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [625.28796, 720.8116, 864.45087, 1156.4592, 725.68964, 645.934, 1009.2574, 713.4699, 690.02, 1078.7039]
2025-05-13 12:13:11,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 132.0, 166.0, 207.0, 128.0, 118.0, 182.0, 131.0, 127.0, 209.0]
2025-05-13 12:13:11,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (823.01) for latency MM1Queue_a033_s075
2025-05-13 12:13:11,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 29 minutes, 6 seconds)
2025-05-13 12:17:44,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:17:46,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 692.97131 ± 161.836
2025-05-13 12:17:46,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [752.75336, 896.6815, 490.1355, 824.00726, 917.39404, 642.24756, 421.12576, 745.9214, 524.4541, 714.9928]
2025-05-13 12:17:46,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 169.0, 99.0, 155.0, 173.0, 120.0, 77.0, 134.0, 93.0, 126.0]
2025-05-13 12:17:46,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 24 minutes, 41 seconds)
2025-05-13 12:22:16,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:22:18,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 747.84412 ± 79.960
2025-05-13 12:22:18,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [740.76605, 695.9546, 738.7868, 885.8033, 725.1941, 645.25757, 899.4315, 670.5993, 707.38135, 769.2671]
2025-05-13 12:22:18,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 124.0, 137.0, 168.0, 130.0, 117.0, 167.0, 122.0, 125.0, 141.0]
2025-05-13 12:22:18,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 19 minutes, 42 seconds)
2025-05-13 12:26:48,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:26:51,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 701.56647 ± 146.047
2025-05-13 12:26:51,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [623.2624, 562.7379, 878.6182, 623.9353, 708.8323, 1013.4502, 713.93677, 782.6546, 515.49194, 592.7447]
2025-05-13 12:26:51,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 102.0, 158.0, 112.0, 138.0, 199.0, 138.0, 141.0, 93.0, 117.0]
2025-05-13 12:26:51,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 15 minutes, 19 seconds)
2025-05-13 12:31:23,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:31:25,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 622.46436 ± 128.197
2025-05-13 12:31:25,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [715.7143, 699.1701, 727.7354, 610.92255, 325.80963, 662.19824, 622.8132, 697.00037, 440.07877, 723.20135]
2025-05-13 12:31:25,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 129.0, 134.0, 125.0, 62.0, 125.0, 120.0, 122.0, 86.0, 133.0]
2025-05-13 12:31:25,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 10 minutes, 46 seconds)
2025-05-13 12:35:57,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:35:59,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 641.36908 ± 98.695
2025-05-13 12:35:59,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [566.44354, 551.86365, 722.5477, 592.6613, 586.54987, 572.3251, 726.7721, 570.6419, 648.63824, 875.2475]
2025-05-13 12:35:59,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 98.0, 131.0, 105.0, 105.0, 103.0, 147.0, 99.0, 115.0, 160.0]
2025-05-13 12:35:59,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 6 minutes, 15 seconds)
2025-05-13 12:40:29,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:40:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 818.47839 ± 140.093
2025-05-13 12:40:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1033.7207, 573.6903, 880.66205, 851.53143, 809.76886, 683.00946, 765.1888, 724.95276, 809.361, 1052.8983]
2025-05-13 12:40:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 121.0, 161.0, 148.0, 145.0, 123.0, 155.0, 133.0, 153.0, 191.0]
2025-05-13 12:40:31,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 4 hours, 1 minute, 7 seconds)
2025-05-13 12:45:04,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:45:06,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 786.33380 ± 197.727
2025-05-13 12:45:06,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [964.73615, 640.06226, 816.9919, 651.23444, 763.0346, 425.3778, 819.291, 1121.8896, 1017.64484, 643.07495]
2025-05-13 12:45:06,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 125.0, 142.0, 116.0, 142.0, 82.0, 158.0, 193.0, 183.0, 115.0]
2025-05-13 12:45:06,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 57 minutes, 13 seconds)
2025-05-13 12:49:37,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:49:39,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 776.66779 ± 180.644
2025-05-13 12:49:39,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [871.1035, 1066.1891, 444.519, 851.4937, 646.0002, 646.2215, 1021.93243, 750.9859, 831.929, 636.3034]
2025-05-13 12:49:39,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 215.0, 92.0, 158.0, 114.0, 116.0, 189.0, 134.0, 154.0, 115.0]
2025-05-13 12:49:39,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 52 minutes, 38 seconds)
2025-05-13 12:54:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:54:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 748.95886 ± 169.815
2025-05-13 12:54:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [509.8455, 732.1422, 719.9649, 494.90204, 859.7898, 1127.2101, 766.9251, 684.0733, 775.5862, 819.1502]
2025-05-13 12:54:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 129.0, 130.0, 93.0, 157.0, 206.0, 139.0, 121.0, 136.0, 149.0]
2025-05-13 12:54:12,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 47 minutes, 49 seconds)
2025-05-13 12:58:46,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:58:49,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 815.75488 ± 196.171
2025-05-13 12:58:49,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [831.39636, 738.5508, 1031.4934, 338.22284, 807.2921, 838.5533, 752.293, 752.0952, 1045.2859, 1022.36523]
2025-05-13 12:58:49,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 140.0, 177.0, 65.0, 141.0, 148.0, 137.0, 136.0, 190.0, 185.0]
2025-05-13 12:58:49,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 43 minutes, 44 seconds)
2025-05-13 13:03:20,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:03:22,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 627.04285 ± 61.346
2025-05-13 13:03:22,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [747.9914, 601.13824, 728.45197, 570.2243, 570.62897, 615.2004, 663.9561, 596.10565, 578.6821, 598.04913]
2025-05-13 13:03:22,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 105.0, 128.0, 106.0, 105.0, 111.0, 119.0, 112.0, 103.0, 105.0]
2025-05-13 13:03:22,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 39 minutes, 17 seconds)
2025-05-13 13:07:51,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:07:54,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 796.18976 ± 305.069
2025-05-13 13:07:54,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [760.26544, 563.8805, 922.19006, 744.2352, 1543.7196, 910.577, 523.726, 879.6229, 768.0043, 345.67642]
2025-05-13 13:07:54,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 107.0, 169.0, 143.0, 282.0, 177.0, 105.0, 164.0, 143.0, 71.0]
2025-05-13 13:07:54,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 34 minutes, 12 seconds)
2025-05-13 13:12:26,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:12:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 782.26672 ± 111.596
2025-05-13 13:12:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [853.4877, 644.1015, 749.8618, 960.24274, 996.21906, 731.6138, 745.4253, 748.3403, 729.4872, 663.88824]
2025-05-13 13:12:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 134.0, 143.0, 186.0, 171.0, 132.0, 132.0, 129.0, 128.0, 142.0]
2025-05-13 13:12:29,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 29 minutes, 59 seconds)
2025-05-13 13:17:05,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:17:08,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 764.66687 ± 133.935
2025-05-13 13:17:08,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [697.78845, 884.51825, 1036.4386, 533.27625, 798.52014, 673.53357, 846.4365, 644.40643, 801.53564, 730.2151]
2025-05-13 13:17:08,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 180.0, 188.0, 114.0, 143.0, 119.0, 157.0, 126.0, 158.0, 147.0]
2025-05-13 13:17:08,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 26 minutes, 21 seconds)
2025-05-13 13:21:35,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:21:37,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 654.83160 ± 180.508
2025-05-13 13:21:37,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [687.9837, 488.61084, 494.87027, 737.06366, 662.0187, 866.9687, 552.4508, 333.16415, 961.1268, 764.0581]
2025-05-13 13:21:37,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 87.0, 99.0, 140.0, 116.0, 154.0, 102.0, 62.0, 178.0, 148.0]
2025-05-13 13:21:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 20 minutes, 37 seconds)
2025-05-13 13:26:06,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:26:08,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 830.55341 ± 177.374
2025-05-13 13:26:08,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [792.2383, 1122.1013, 878.0727, 802.2723, 671.88385, 733.8457, 716.9904, 556.28625, 886.0761, 1145.7677]
2025-05-13 13:26:08,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 208.0, 175.0, 143.0, 117.0, 135.0, 131.0, 118.0, 159.0, 201.0]
2025-05-13 13:26:08,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (830.55) for latency MM1Queue_a033_s075
2025-05-13 13:26:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 15 minutes, 52 seconds)
2025-05-13 13:30:40,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:30:43,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 782.20422 ± 147.373
2025-05-13 13:30:43,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [657.77106, 845.33386, 790.996, 703.06274, 753.8392, 564.7045, 778.9327, 1156.5576, 824.77484, 746.07007]
2025-05-13 13:30:43,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 153.0, 141.0, 128.0, 141.0, 106.0, 143.0, 195.0, 146.0, 132.0]
2025-05-13 13:30:43,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 11 minutes, 40 seconds)
2025-05-13 13:35:15,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:35:18,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 783.10901 ± 174.472
2025-05-13 13:35:18,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1204.738, 722.895, 674.8598, 815.272, 992.6753, 629.16, 792.5302, 685.58716, 679.83875, 633.533]
2025-05-13 13:35:18,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [233.0, 129.0, 123.0, 144.0, 180.0, 111.0, 142.0, 124.0, 122.0, 112.0]
2025-05-13 13:35:18,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 7 minutes, 6 seconds)
2025-05-13 13:39:52,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:39:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 752.97937 ± 170.480
2025-05-13 13:39:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [633.96136, 764.3215, 1094.5121, 846.2544, 676.6259, 1004.88403, 731.96106, 609.91016, 541.989, 625.3738]
2025-05-13 13:39:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 132.0, 185.0, 148.0, 118.0, 174.0, 139.0, 109.0, 94.0, 114.0]
2025-05-13 13:39:55,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 2 minutes, 15 seconds)
2025-05-13 13:44:25,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:44:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 765.51544 ± 125.699
2025-05-13 13:44:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [924.58936, 767.7997, 724.23206, 602.56116, 989.4077, 552.86285, 804.12616, 833.30457, 717.9958, 738.2755]
2025-05-13 13:44:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 138.0, 133.0, 114.0, 188.0, 98.0, 142.0, 172.0, 146.0, 140.0]
2025-05-13 13:44:27,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 58 minutes, 13 seconds)
2025-05-13 13:48:58,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:49:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 888.15125 ± 139.453
2025-05-13 13:49:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1113.2898, 849.5755, 683.2086, 915.9485, 926.4973, 1035.2117, 864.7934, 1038.4794, 754.6737, 699.835]
2025-05-13 13:49:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 149.0, 123.0, 152.0, 166.0, 198.0, 156.0, 183.0, 135.0, 125.0]
2025-05-13 13:49:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (888.15) for latency MM1Queue_a033_s075
2025-05-13 13:49:00,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 53 minutes, 46 seconds)
2025-05-13 13:53:33,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:53:36,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 793.90930 ± 337.925
2025-05-13 13:53:36,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1065.9548, 1389.6425, 165.38087, 1210.7969, 771.68524, 731.50073, 546.00165, 520.8691, 804.0343, 733.2269]
2025-05-13 13:53:36,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 261.0, 32.0, 240.0, 140.0, 142.0, 111.0, 107.0, 144.0, 137.0]
2025-05-13 13:53:36,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 49 minutes, 19 seconds)
2025-05-13 13:58:04,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:58:06,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 798.66992 ± 168.570
2025-05-13 13:58:06,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [694.91174, 684.26636, 689.9174, 713.24005, 614.28674, 1197.4476, 919.1073, 738.84467, 765.7436, 968.934]
2025-05-13 13:58:06,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 121.0, 121.0, 128.0, 110.0, 210.0, 176.0, 138.0, 133.0, 178.0]
2025-05-13 13:58:06,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 44 minutes, 13 seconds)
2025-05-13 14:02:42,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:02:45,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 861.86932 ± 86.214
2025-05-13 14:02:45,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [814.5791, 803.8997, 807.54474, 737.11475, 752.99365, 916.7617, 884.7936, 959.6116, 942.5532, 998.8414]
2025-05-13 14:02:45,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 149.0, 145.0, 129.0, 139.0, 162.0, 162.0, 168.0, 160.0, 171.0]
2025-05-13 14:02:45,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 39 minutes, 49 seconds)
2025-05-13 14:07:13,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:07:16,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 900.87793 ± 225.702
2025-05-13 14:07:16,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [604.6036, 716.7788, 839.46136, 979.2727, 703.86835, 1201.1794, 1385.152, 793.048, 856.9972, 928.4177]
2025-05-13 14:07:16,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 129.0, 148.0, 171.0, 127.0, 219.0, 238.0, 165.0, 161.0, 164.0]
2025-05-13 14:07:16,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (900.88) for latency MM1Queue_a033_s075
2025-05-13 14:07:16,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 35 minutes, 5 seconds)
2025-05-13 14:11:48,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:11:50,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 619.22607 ± 163.556
2025-05-13 14:11:50,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [620.48724, 686.7082, 411.13568, 806.872, 381.81027, 923.7265, 493.92487, 560.9653, 736.77783, 569.85284]
2025-05-13 14:11:50,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 144.0, 90.0, 145.0, 83.0, 173.0, 98.0, 111.0, 147.0, 114.0]
2025-05-13 14:11:50,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 30 minutes, 39 seconds)
2025-05-13 14:16:21,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:16:24,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 979.77295 ± 104.027
2025-05-13 14:16:24,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1020.36273, 907.2195, 1035.363, 1122.819, 879.5902, 797.2413, 900.1746, 998.80566, 1145.8751, 990.27844]
2025-05-13 14:16:24,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 185.0, 179.0, 201.0, 151.0, 159.0, 163.0, 177.0, 195.0, 175.0]
2025-05-13 14:16:24,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (979.77) for latency MM1Queue_a033_s075
2025-05-13 14:16:24,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 25 minutes, 58 seconds)
2025-05-13 14:20:58,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:21:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 814.68463 ± 129.216
2025-05-13 14:21:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [792.12396, 1022.58936, 788.46704, 743.25006, 675.6627, 760.49915, 730.3931, 1100.3942, 736.3469, 797.1195]
2025-05-13 14:21:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 179.0, 141.0, 132.0, 117.0, 130.0, 131.0, 202.0, 132.0, 141.0]
2025-05-13 14:21:01,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 22 minutes, 1 second)
2025-05-13 14:25:35,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:25:38,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 796.39294 ± 245.055
2025-05-13 14:25:38,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [970.8257, 711.8934, 723.34283, 714.48975, 1293.0171, 1011.6306, 768.72504, 870.9291, 523.7298, 375.3463]
2025-05-13 14:25:38,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 127.0, 145.0, 127.0, 248.0, 188.0, 140.0, 174.0, 102.0, 68.0]
2025-05-13 14:25:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 17 minutes, 18 seconds)
2025-05-13 14:30:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:30:07,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 725.37323 ± 120.089
2025-05-13 14:30:07,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [603.0801, 752.3452, 890.9175, 931.07666, 703.7424, 525.09393, 634.72125, 680.44867, 717.8309, 814.4755]
2025-05-13 14:30:07,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 136.0, 165.0, 163.0, 126.0, 94.0, 127.0, 122.0, 128.0, 150.0]
2025-05-13 14:30:07,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 12 minutes, 33 seconds)
2025-05-13 14:34:40,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:34:43,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 858.61169 ± 149.408
2025-05-13 14:34:43,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [758.7437, 823.7886, 768.0306, 912.4483, 749.67285, 768.4457, 1121.4001, 1128.6613, 891.0078, 663.9184]
2025-05-13 14:34:43,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 143.0, 135.0, 165.0, 135.0, 140.0, 193.0, 205.0, 169.0, 128.0]
2025-05-13 14:34:43,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 8 minutes, 8 seconds)
2025-05-13 14:39:12,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:39:15,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 747.72980 ± 100.104
2025-05-13 14:39:15,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [683.54926, 860.9932, 722.5371, 945.1513, 774.35944, 816.20624, 722.6843, 587.5979, 641.0599, 723.15845]
2025-05-13 14:39:15,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 151.0, 143.0, 166.0, 137.0, 151.0, 148.0, 119.0, 123.0, 150.0]
2025-05-13 14:39:15,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 3 minutes, 19 seconds)
2025-05-13 14:43:47,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:43:49,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 855.05194 ± 173.191
2025-05-13 14:43:49,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [736.5708, 849.9055, 780.7149, 568.7274, 1028.3601, 1026.0585, 740.9276, 1008.0982, 684.73065, 1126.4257]
2025-05-13 14:43:49,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 156.0, 146.0, 110.0, 195.0, 183.0, 145.0, 171.0, 124.0, 196.0]
2025-05-13 14:43:49,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 58 minutes, 35 seconds)
2025-05-13 14:48:23,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:48:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 724.85071 ± 209.722
2025-05-13 14:48:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [489.18478, 1021.86615, 682.62427, 525.9676, 962.9635, 826.28094, 1039.7346, 501.08954, 540.00385, 658.7922]
2025-05-13 14:48:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 193.0, 139.0, 107.0, 170.0, 143.0, 191.0, 92.0, 99.0, 135.0]
2025-05-13 14:48:25,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 53 minutes, 58 seconds)
2025-05-13 14:52:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:52:58,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1030.55078 ± 185.860
2025-05-13 14:52:58,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [983.2151, 1083.8561, 796.1103, 1292.4192, 1213.1052, 767.2578, 1158.7225, 877.7931, 1256.0138, 877.01465]
2025-05-13 14:52:58,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 195.0, 145.0, 241.0, 216.0, 137.0, 227.0, 153.0, 241.0, 155.0]
2025-05-13 14:52:58,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (1030.55) for latency MM1Queue_a033_s075
2025-05-13 14:52:58,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 49 minutes, 39 seconds)
2025-05-13 14:57:30,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:57:33,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 923.18152 ± 123.797
2025-05-13 14:57:33,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [845.4373, 1196.0044, 986.3082, 952.2722, 945.79376, 770.5401, 722.6975, 911.0414, 984.51605, 917.20483]
2025-05-13 14:57:33,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 240.0, 183.0, 177.0, 173.0, 159.0, 137.0, 157.0, 180.0, 167.0]
2025-05-13 14:57:33,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 45 minutes, 3 seconds)
2025-05-13 15:02:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:02:06,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 832.02979 ± 280.533
2025-05-13 15:02:06,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [832.7912, 402.54364, 921.1853, 837.47076, 413.76776, 1421.0995, 822.2348, 1092.5156, 767.8413, 808.84827]
2025-05-13 15:02:06,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 74.0, 163.0, 154.0, 92.0, 247.0, 147.0, 188.0, 144.0, 160.0]
2025-05-13 15:02:06,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 40 minutes, 35 seconds)
2025-05-13 15:06:41,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:06:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 892.30408 ± 167.837
2025-05-13 15:06:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [753.822, 1026.134, 1159.7274, 757.1439, 768.80756, 902.1996, 819.622, 750.9689, 1209.3464, 775.26996]
2025-05-13 15:06:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 183.0, 200.0, 133.0, 139.0, 162.0, 140.0, 131.0, 207.0, 143.0]
2025-05-13 15:06:44,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 36 minutes, 14 seconds)
2025-05-13 15:11:15,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:11:17,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 884.07910 ± 127.602
2025-05-13 15:11:17,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [816.14294, 959.1237, 674.7387, 804.09644, 1053.7598, 1041.0863, 843.77295, 1024.895, 719.53253, 903.6424]
2025-05-13 15:11:17,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 182.0, 120.0, 143.0, 179.0, 174.0, 148.0, 172.0, 124.0, 159.0]
2025-05-13 15:11:17,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 31 minutes, 27 seconds)
2025-05-13 15:15:48,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:15:50,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 805.23230 ± 116.872
2025-05-13 15:15:50,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [757.7021, 758.8139, 945.4482, 828.98376, 691.9644, 899.14685, 1047.6268, 687.8695, 751.57825, 683.1891]
2025-05-13 15:15:50,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 139.0, 197.0, 148.0, 139.0, 174.0, 182.0, 121.0, 158.0, 119.0]
2025-05-13 15:15:50,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 26 minutes, 53 seconds)
2025-05-13 15:20:21,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:20:24,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1015.73596 ± 238.511
2025-05-13 15:20:24,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1382.0874, 988.23126, 1141.0518, 980.61554, 1410.2119, 828.3313, 1033.1635, 716.96277, 1030.3654, 646.3388]
2025-05-13 15:20:24,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [261.0, 176.0, 198.0, 173.0, 239.0, 148.0, 181.0, 132.0, 191.0, 115.0]
2025-05-13 15:20:24,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 22 minutes, 14 seconds)
2025-05-13 15:24:56,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:24:58,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 821.64240 ± 345.684
2025-05-13 15:24:58,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1217.2747, 1238.2867, 592.1576, 800.84753, 1086.1316, 169.58792, 741.7267, 748.1862, 1201.3859, 420.83878]
2025-05-13 15:24:58,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 232.0, 118.0, 144.0, 210.0, 34.0, 134.0, 134.0, 212.0, 77.0]
2025-05-13 15:24:58,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 17 minutes, 45 seconds)
2025-05-13 15:29:29,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:29:32,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 925.51123 ± 332.392
2025-05-13 15:29:32,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1010.5232, 1319.3185, 1112.0602, 629.87, 936.40454, 730.4607, 625.0184, 1626.5149, 752.2944, 512.64734]
2025-05-13 15:29:32,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 232.0, 191.0, 133.0, 167.0, 153.0, 131.0, 291.0, 136.0, 104.0]
2025-05-13 15:29:32,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 12 minutes, 57 seconds)
2025-05-13 15:34:05,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:34:08,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 903.37030 ± 317.305
2025-05-13 15:34:08,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1289.4574, 938.33826, 447.7038, 1335.6066, 302.1922, 883.34674, 992.4418, 790.636, 1183.6051, 870.3758]
2025-05-13 15:34:08,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 171.0, 84.0, 226.0, 59.0, 152.0, 175.0, 137.0, 234.0, 158.0]
2025-05-13 15:34:08,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 8 minutes, 31 seconds)
2025-05-13 15:38:40,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:38:42,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 914.31073 ± 175.594
2025-05-13 15:38:42,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1028.8812, 1012.03815, 748.2539, 786.0811, 767.7077, 859.0335, 1238.2451, 847.78705, 695.8284, 1159.2509]
2025-05-13 15:38:42,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 180.0, 134.0, 136.0, 139.0, 152.0, 216.0, 156.0, 122.0, 199.0]
2025-05-13 15:38:42,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 4 minutes, 2 seconds)
2025-05-13 15:43:16,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:43:18,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 938.84338 ± 213.458
2025-05-13 15:43:18,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1250.0818, 899.0361, 1155.6398, 784.4298, 855.7302, 650.1261, 1274.7521, 718.3331, 774.4011, 1025.903]
2025-05-13 15:43:18,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 158.0, 204.0, 143.0, 158.0, 118.0, 217.0, 128.0, 150.0, 178.0]
2025-05-13 15:43:18,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 59 minutes, 34 seconds)
2025-05-13 15:47:51,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:47:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 900.08282 ± 310.305
2025-05-13 15:47:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1372.011, 269.2694, 1057.2498, 1098.5465, 1213.964, 1060.9308, 749.4269, 561.97345, 814.32245, 803.13385]
2025-05-13 15:47:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [238.0, 52.0, 177.0, 217.0, 207.0, 197.0, 140.0, 112.0, 147.0, 139.0]
2025-05-13 15:47:54,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 55 minutes)
2025-05-13 15:52:22,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:52:24,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 729.82733 ± 134.671
2025-05-13 15:52:24,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [637.06836, 724.4838, 835.84827, 643.51105, 684.4021, 588.5633, 693.71375, 1087.9852, 721.5641, 681.1333]
2025-05-13 15:52:24,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 148.0, 151.0, 128.0, 132.0, 119.0, 142.0, 205.0, 145.0, 135.0]
2025-05-13 15:52:24,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 50 minutes, 18 seconds)
2025-05-13 15:56:59,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:57:02,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 864.12024 ± 249.663
2025-05-13 15:57:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [832.38257, 543.54395, 1051.9161, 1351.178, 1138.9348, 968.69037, 741.3956, 764.2915, 522.7845, 726.08435]
2025-05-13 15:57:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 111.0, 180.0, 232.0, 198.0, 189.0, 132.0, 137.0, 99.0, 127.0]
2025-05-13 15:57:02,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 45 minutes, 48 seconds)
2025-05-13 16:01:35,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:01:37,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 833.30762 ± 296.420
2025-05-13 16:01:37,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [863.75476, 752.4927, 1622.5702, 963.48975, 719.122, 577.0717, 795.6291, 555.26605, 572.7931, 910.88684]
2025-05-13 16:01:37,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 154.0, 305.0, 178.0, 148.0, 115.0, 151.0, 112.0, 117.0, 175.0]
2025-05-13 16:01:37,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 41 minutes, 14 seconds)
2025-05-13 16:06:06,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:06:08,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 823.85175 ± 118.709
2025-05-13 16:06:08,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [679.5352, 1037.2318, 754.9713, 952.0662, 768.2958, 972.6267, 672.8014, 765.97205, 786.608, 848.40875]
2025-05-13 16:06:08,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 179.0, 138.0, 163.0, 132.0, 169.0, 120.0, 133.0, 137.0, 146.0]
2025-05-13 16:06:08,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 36 minutes, 32 seconds)
2025-05-13 16:10:41,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:10:43,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 914.80304 ± 190.439
2025-05-13 16:10:43,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [980.52435, 765.36456, 1093.6172, 975.2983, 1153.9172, 663.0568, 1056.1572, 551.551, 842.95447, 1065.5886]
2025-05-13 16:10:43,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 144.0, 187.0, 198.0, 201.0, 118.0, 180.0, 103.0, 155.0, 181.0]
2025-05-13 16:10:43,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 31 minutes, 57 seconds)
2025-05-13 16:15:14,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:15:17,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 862.42908 ± 82.453
2025-05-13 16:15:17,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [871.5756, 863.3004, 800.0271, 1020.2217, 909.64514, 868.72437, 967.3966, 793.4073, 797.16925, 732.8234]
2025-05-13 16:15:17,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 151.0, 158.0, 177.0, 164.0, 173.0, 173.0, 138.0, 150.0, 147.0]
2025-05-13 16:15:17,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 27 minutes, 26 seconds)
2025-05-13 16:19:48,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:19:51,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 831.97803 ± 223.664
2025-05-13 16:19:51,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [824.33875, 788.0779, 747.6199, 1009.5856, 602.7781, 372.01074, 1076.0198, 1023.22064, 742.02045, 1134.1084]
2025-05-13 16:19:51,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 137.0, 128.0, 173.0, 116.0, 69.0, 191.0, 175.0, 132.0, 197.0]
2025-05-13 16:19:51,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 48 seconds)
2025-05-13 16:24:21,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:24:23,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 811.70715 ± 182.866
2025-05-13 16:24:23,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [825.49286, 602.1435, 783.2884, 858.0291, 843.1414, 705.11725, 559.87946, 956.09875, 1242.6932, 741.1875]
2025-05-13 16:24:23,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 104.0, 153.0, 171.0, 148.0, 148.0, 105.0, 195.0, 212.0, 137.0]
2025-05-13 16:24:23,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 18 minutes, 12 seconds)
2025-05-13 16:29:01,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:29:03,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 875.57568 ± 215.521
2025-05-13 16:29:03,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [605.0618, 725.82056, 794.08673, 1032.1116, 548.8515, 1002.82544, 958.20386, 735.511, 1117.1641, 1236.1199]
2025-05-13 16:29:03,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 134.0, 137.0, 178.0, 111.0, 175.0, 165.0, 129.0, 200.0, 217.0]
2025-05-13 16:29:03,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 44 seconds)
2025-05-13 16:33:33,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:33:35,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 853.41895 ± 310.238
2025-05-13 16:33:35,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [760.5033, 1588.114, 668.2314, 737.40717, 727.8038, 719.50323, 1009.99835, 347.59338, 1077.6882, 897.3462]
2025-05-13 16:33:35,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 298.0, 117.0, 126.0, 124.0, 127.0, 182.0, 65.0, 191.0, 157.0]
2025-05-13 16:33:35,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 8 seconds)
2025-05-13 16:38:04,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:38:06,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 822.98828 ± 229.647
2025-05-13 16:38:06,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1139.968, 677.9207, 687.0293, 702.6089, 1066.0165, 1132.5228, 469.21622, 603.53876, 1015.4795, 735.5817]
2025-05-13 16:38:06,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 125.0, 139.0, 123.0, 189.0, 214.0, 94.0, 124.0, 172.0, 125.0]
2025-05-13 16:38:06,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 33 seconds)
2025-05-13 16:42:39,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:42:41,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 877.11768 ± 401.339
2025-05-13 16:42:41,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1803.6348, 708.4771, 847.5395, 1225.6072, 271.47232, 869.49866, 406.0142, 791.5216, 946.81354, 900.5981]
2025-05-13 16:42:41,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [338.0, 143.0, 144.0, 221.0, 58.0, 149.0, 90.0, 139.0, 177.0, 182.0]
2025-05-13 16:42:41,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1251 [DEBUG]: Training session finished
