2025-05-13 09:06:30,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mda-mem4
2025-05-13 09:06:30,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mda-mem4
2025-05-13 09:06:30,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x146b86d1de10>}
2025-05-13 09:06:30,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:30,118 baseline-bpql-mda-noisy-halfcheetah:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-13 09:06:30,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-13 09:06:30,129 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:30,129 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:30,134 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:30,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:30,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:14,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:10:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -335.06476 ± 20.373
2025-05-13 09:10:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-343.57852, -344.31915, -371.33167, -310.3688, -343.82733, -324.17886, -345.58655, -331.21954, -293.91074, -342.32657]
2025-05-13 09:10:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:10:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-335.06) for latency MM1Queue_a033_s075
2025-05-13 09:10:27,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 30 minutes, 36 seconds)
2025-05-13 09:14:15,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:14:28,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 169.20451 ± 159.133
2025-05-13 09:14:28,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [388.36972, 142.59085, -29.077608, 260.2734, 335.47668, -24.841125, 11.457033, 225.98984, 14.081676, 367.72467]
2025-05-13 09:14:28,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:14:28,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (169.20) for latency MM1Queue_a033_s075
2025-05-13 09:14:28,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 29 minutes, 45 seconds)
2025-05-13 09:18:15,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:18:28,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1074.98181 ± 136.676
2025-05-13 09:18:28,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1031.0726, 1067.5824, 809.1872, 1294.6364, 1145.2719, 1217.1255, 916.24084, 1040.4684, 1038.8567, 1189.3771]
2025-05-13 09:18:28,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:18:28,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1074.98) for latency MM1Queue_a033_s075
2025-05-13 09:18:28,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 26 minutes, 49 seconds)
2025-05-13 09:22:15,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:22:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1327.05811 ± 340.525
2025-05-13 09:22:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1048.7877, 1531.797, 1737.2234, 1525.0017, 1020.56, 805.0404, 1725.5653, 851.823, 1593.03, 1431.7526]
2025-05-13 09:22:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:22:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1327.06) for latency MM1Queue_a033_s075
2025-05-13 09:22:29,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 23 minutes, 22 seconds)
2025-05-13 09:26:16,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:26:29,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1529.12292 ± 374.408
2025-05-13 09:26:29,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1037.6626, 2070.3242, 933.161, 1673.6573, 1172.3503, 1688.4281, 1809.5408, 1673.4459, 1281.6799, 1950.9794]
2025-05-13 09:26:29,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:26:29,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1529.12) for latency MM1Queue_a033_s075
2025-05-13 09:26:29,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 19 minutes, 42 seconds)
2025-05-13 09:30:17,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:30:30,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1934.65039 ± 526.368
2025-05-13 09:30:30,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1578.0503, 1956.3883, 853.579, 2347.6458, 1662.0073, 2322.2446, 2188.056, 1379.9298, 2529.6445, 2528.9592]
2025-05-13 09:30:30,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:30:30,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1934.65) for latency MM1Queue_a033_s075
2025-05-13 09:30:30,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 16 minutes, 52 seconds)
2025-05-13 09:34:17,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:34:30,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2318.77686 ± 478.144
2025-05-13 09:34:30,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2356.7363, 2736.9553, 2780.6304, 1197.4142, 2492.4822, 2604.5535, 1741.6678, 2451.3455, 2694.8767, 2131.108]
2025-05-13 09:34:30,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:34:30,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2318.78) for latency MM1Queue_a033_s075
2025-05-13 09:34:30,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 12 minutes, 50 seconds)
2025-05-13 09:38:17,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:38:31,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2672.48193 ± 219.200
2025-05-13 09:38:31,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2758.9082, 2449.4875, 2596.543, 2512.6614, 3026.987, 2950.6023, 2318.6016, 2835.108, 2770.0513, 2505.8684]
2025-05-13 09:38:31,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:38:31,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2672.48) for latency MM1Queue_a033_s075
2025-05-13 09:38:31,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 8 minutes, 44 seconds)
2025-05-13 09:42:18,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:42:31,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2688.43799 ± 651.602
2025-05-13 09:42:31,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [845.2059, 2737.7314, 3333.3499, 2717.6038, 3004.6433, 2823.2703, 2500.9272, 3036.6746, 2829.6252, 3055.347]
2025-05-13 09:42:31,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:42:31,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2688.44) for latency MM1Queue_a033_s075
2025-05-13 09:42:31,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 4 minutes, 48 seconds)
2025-05-13 09:46:19,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:46:32,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2548.29395 ± 684.154
2025-05-13 09:46:32,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [695.6473, 1950.0355, 2659.1162, 2961.798, 2968.3445, 2959.1587, 3005.1177, 2654.1497, 2816.7136, 2812.8594]
2025-05-13 09:46:32,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:46:32,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 46 seconds)
2025-05-13 09:50:19,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:50:33,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3013.08936 ± 262.443
2025-05-13 09:50:33,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3082.2878, 2842.3325, 3087.4297, 3189.723, 2888.1519, 2568.327, 3087.6462, 3335.7808, 2640.7375, 3408.477]
2025-05-13 09:50:33,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:50:33,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3013.09) for latency MM1Queue_a033_s075
2025-05-13 09:50:33,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 56 minutes, 47 seconds)
2025-05-13 09:54:20,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:54:34,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2928.77563 ± 783.852
2025-05-13 09:54:34,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3065.2405, 3075.2024, 3341.648, 632.0664, 3403.3098, 3317.3396, 2945.8738, 3377.7656, 2916.5527, 3212.7578]
2025-05-13 09:54:34,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:54:34,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 52 minutes, 56 seconds)
2025-05-13 09:58:21,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:58:34,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2928.82178 ± 381.920
2025-05-13 09:58:34,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3149.9387, 3188.8953, 2980.4207, 2797.3086, 3075.3008, 3000.2778, 3014.9668, 3090.0378, 1829.6917, 3161.3784]
2025-05-13 09:58:34,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:58:34,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 49 minutes, 1 second)
2025-05-13 10:02:22,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:02:35,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2930.55664 ± 607.459
2025-05-13 10:02:35,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3046.2568, 3311.5103, 1203.1635, 3119.304, 2829.356, 3262.3152, 3460.8567, 3134.934, 3150.3137, 2787.5562]
2025-05-13 10:02:35,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:02:35,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 45 minutes, 2 seconds)
2025-05-13 10:06:22,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:06:36,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3072.19214 ± 172.616
2025-05-13 10:06:36,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3157.4153, 3094.829, 2964.284, 3380.1367, 2734.6855, 2937.6033, 3121.5618, 3208.26, 2938.0588, 3185.0867]
2025-05-13 10:06:36,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:06:36,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3072.19) for latency MM1Queue_a033_s075
2025-05-13 10:06:36,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 40 minutes, 59 seconds)
2025-05-13 10:10:23,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:10:36,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3178.44385 ± 236.418
2025-05-13 10:10:36,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2931.957, 2718.3262, 3201.017, 3011.523, 3462.1685, 3473.7185, 3316.211, 3073.6025, 3424.2, 3171.714]
2025-05-13 10:10:36,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:10:36,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3178.44) for latency MM1Queue_a033_s075
2025-05-13 10:10:36,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 37 minutes)
2025-05-13 10:14:24,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:14:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3315.58057 ± 304.457
2025-05-13 10:14:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3394.882, 3242.068, 3688.4065, 3361.4258, 2982.2786, 3579.205, 3007.9495, 3476.9297, 3686.278, 2736.3826]
2025-05-13 10:14:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:14:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3315.58) for latency MM1Queue_a033_s075
2025-05-13 10:14:37,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 32 minutes, 52 seconds)
2025-05-13 10:18:24,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:18:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3142.33130 ± 719.692
2025-05-13 10:18:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3544.8018, 1032.3259, 3357.8037, 3313.7349, 3360.921, 3170.2034, 3565.3777, 3356.394, 3115.1313, 3606.6194]
2025-05-13 10:18:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:18:36,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 28 minutes, 38 seconds)
2025-05-13 10:22:23,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:22:36,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3448.46729 ± 222.905
2025-05-13 10:22:36,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3534.9734, 3438.8542, 3729.3306, 3157.4084, 3339.499, 3138.216, 3226.6667, 3461.2239, 3656.7107, 3801.7896]
2025-05-13 10:22:36,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:22:36,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3448.47) for latency MM1Queue_a033_s075
2025-05-13 10:22:36,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 24 minutes, 7 seconds)
2025-05-13 10:26:21,260 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:26:34,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3030.69995 ± 896.578
2025-05-13 10:26:34,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3511.8008, 3512.5469, 1113.9297, 2847.5579, 3260.4607, 3435.9192, 3424.0203, 4192.9546, 3395.4739, 1612.3363]
2025-05-13 10:26:34,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:26:34,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 19 minutes, 27 seconds)
2025-05-13 10:30:19,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:30:32,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3051.57764 ± 669.379
2025-05-13 10:30:32,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2672.653, 3362.9558, 3194.331, 1156.4928, 3418.3484, 3367.8428, 3385.7686, 3425.4556, 3107.0945, 3424.8337]
2025-05-13 10:30:32,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:30:32,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 14 minutes, 48 seconds)
2025-05-13 10:34:17,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:34:29,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3219.22534 ± 292.155
2025-05-13 10:34:29,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3358.104, 3359.954, 2506.22, 3590.5422, 2880.4075, 3218.2332, 3325.8909, 3342.5623, 3382.7493, 3227.5894]
2025-05-13 10:34:29,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:34:29,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 10 minutes, 3 seconds)
2025-05-13 10:38:14,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:38:27,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3374.14795 ± 565.417
2025-05-13 10:38:27,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3266.7656, 3373.5298, 3795.8445, 3635.9595, 3441.281, 3625.8596, 1738.2753, 3545.6482, 3618.9016, 3699.4146]
2025-05-13 10:38:27,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:38:27,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 5 minutes, 27 seconds)
2025-05-13 10:42:11,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:42:24,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3086.30811 ± 792.521
2025-05-13 10:42:24,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3193.8413, 3346.0713, 3423.568, 3497.4314, 3476.3376, 3215.705, 3262.7288, 3531.9417, 738.4834, 3176.9705]
2025-05-13 10:42:24,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:42:24,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 1 minute, 6 seconds)
2025-05-13 10:46:08,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:46:21,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3321.47314 ± 704.873
2025-05-13 10:46:21,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3519.789, 3664.6797, 3525.8188, 3445.342, 4020.0603, 3788.3484, 1456.5017, 3642.9417, 3491.497, 2659.7524]
2025-05-13 10:46:21,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:46:21,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 56 minutes, 52 seconds)
2025-05-13 10:50:06,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:50:18,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3786.07227 ± 124.092
2025-05-13 10:50:18,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3689.3115, 3977.1492, 3935.9556, 3586.5862, 3733.1226, 3938.0132, 3839.5283, 3736.7144, 3668.8684, 3755.4727]
2025-05-13 10:50:18,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:50:18,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3786.07) for latency MM1Queue_a033_s075
2025-05-13 10:50:18,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 52 minutes, 43 seconds)
2025-05-13 10:54:03,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:54:16,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3433.59229 ± 620.521
2025-05-13 10:54:16,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3462.1936, 4025.7288, 2821.9548, 3828.045, 3796.188, 3577.8503, 3446.7993, 3730.5198, 3819.2744, 1827.3689]
2025-05-13 10:54:16,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:54:16,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 48 minutes, 39 seconds)
2025-05-13 10:58:00,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:58:13,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3459.76050 ± 673.895
2025-05-13 10:58:13,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3828.757, 3521.5933, 3351.4404, 3552.4512, 3501.9084, 1520.6469, 3818.8738, 3703.448, 3750.1475, 4048.339]
2025-05-13 10:58:13,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:58:13,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 44 minutes, 41 seconds)
2025-05-13 11:01:58,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:02:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3806.57495 ± 201.245
2025-05-13 11:02:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3802.1836, 4055.5466, 3635.7354, 3849.4353, 3823.1763, 3997.5337, 4044.4573, 3405.7886, 3875.9563, 3575.9385]
2025-05-13 11:02:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:02:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3806.57) for latency MM1Queue_a033_s075
2025-05-13 11:02:10,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 40 minutes, 43 seconds)
2025-05-13 11:05:55,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:06:08,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3725.70166 ± 328.578
2025-05-13 11:06:08,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3650.71, 3720.005, 3560.3235, 4146.4424, 3533.8586, 4193.4873, 3712.426, 3620.8208, 3036.2405, 4082.7012]
2025-05-13 11:06:08,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:06:08,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 36 minutes, 56 seconds)
2025-05-13 11:09:53,371 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:10:06,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3572.43628 ± 426.624
2025-05-13 11:10:06,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3380.9272, 2700.4478, 4030.468, 3613.081, 4285.148, 3509.9863, 3109.411, 3563.8232, 3838.6594, 3692.4119]
2025-05-13 11:10:06,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:10:06,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 33 minutes, 7 seconds)
2025-05-13 11:13:51,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:14:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3366.56445 ± 733.218
2025-05-13 11:14:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1835.3701, 3796.6028, 3227.9207, 2167.147, 3502.21, 3934.373, 3628.3127, 3473.252, 4179.221, 3921.2358]
2025-05-13 11:14:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:14:04,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 29 minutes, 21 seconds)
2025-05-13 11:17:49,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:18:02,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3608.37378 ± 929.224
2025-05-13 11:18:02,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3695.07, 3816.8933, 3439.1558, 4071.6475, 959.0988, 3856.6958, 3768.3525, 4403.442, 3693.6472, 4379.7373]
2025-05-13 11:18:02,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:18:02,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 25 minutes, 37 seconds)
2025-05-13 11:21:47,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:22:00,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3621.24341 ± 633.800
2025-05-13 11:22:00,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3640.0203, 3418.157, 4035.6848, 3829.8052, 3525.517, 1924.849, 3559.499, 4050.388, 4429.811, 3798.7014]
2025-05-13 11:22:00,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:22:00,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 21 minutes, 44 seconds)
2025-05-13 11:25:45,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:25:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3566.54639 ± 375.233
2025-05-13 11:25:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3904.1724, 3131.0906, 2710.5747, 3466.9946, 3936.3071, 3947.0427, 3699.2175, 3632.2397, 3456.2988, 3781.5278]
2025-05-13 11:25:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:25:58,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 17 minutes, 54 seconds)
2025-05-13 11:29:43,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:29:56,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3706.02002 ± 447.447
2025-05-13 11:29:56,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4074.0254, 3492.0098, 3944.516, 3958.7712, 3964.3079, 3672.5767, 2494.952, 4095.047, 3762.8691, 3601.1213]
2025-05-13 11:29:56,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:29:56,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 13 minutes, 57 seconds)
2025-05-13 11:33:42,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:33:54,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3832.20361 ± 492.514
2025-05-13 11:33:54,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4227.677, 3786.6367, 3828.2913, 3503.4407, 3957.5186, 4282.5366, 2518.6667, 3938.6426, 4051.889, 4226.735]
2025-05-13 11:33:54,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:33:54,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3832.20) for latency MM1Queue_a033_s075
2025-05-13 11:33:55,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 10 minutes, 1 second)
2025-05-13 11:37:40,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:37:53,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3819.95654 ± 330.999
2025-05-13 11:37:53,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3428.0322, 3625.0447, 3742.1973, 3759.3918, 3519.7434, 3457.1929, 4402.6626, 3985.673, 4364.4736, 3915.155]
2025-05-13 11:37:53,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:37:53,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 6 minutes)
2025-05-13 11:41:38,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:41:51,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3884.76440 ± 255.035
2025-05-13 11:41:51,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3420.39, 3915.046, 4436.1035, 3986.292, 3713.0266, 4087.635, 3894.814, 3910.5996, 3802.4023, 3681.3367]
2025-05-13 11:41:51,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:41:51,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3884.76) for latency MM1Queue_a033_s075
2025-05-13 11:41:51,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 2 minutes, 6 seconds)
2025-05-13 11:45:36,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:45:49,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3812.67896 ± 540.608
2025-05-13 11:45:49,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3651.0664, 3868.035, 4308.5967, 3586.1257, 2340.4436, 4011.8928, 4167.2, 4076.598, 4249.271, 3867.557]
2025-05-13 11:45:49,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:45:49,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 58 minutes, 4 seconds)
2025-05-13 11:49:34,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:49:47,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3864.43213 ± 300.298
2025-05-13 11:49:47,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4162.059, 4097.9146, 4193.541, 3803.843, 3597.7764, 3344.927, 3581.659, 3923.1475, 3653.391, 4286.061]
2025-05-13 11:49:47,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:49:47,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 54 minutes, 5 seconds)
2025-05-13 11:53:31,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:53:44,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3858.44092 ± 393.556
2025-05-13 11:53:44,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3566.16, 3815.3472, 3854.0774, 4454.8643, 4044.5034, 4210.597, 2922.1748, 3847.7283, 4106.1377, 3762.819]
2025-05-13 11:53:44,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:53:44,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 49 minutes, 58 seconds)
2025-05-13 11:57:29,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:57:42,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3544.12891 ± 946.782
2025-05-13 11:57:42,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3783.4617, 3888.423, 4253.4404, 4128.583, 3885.903, 4369.2676, 4031.524, 3687.4854, 1447.1355, 1966.0647]
2025-05-13 11:57:42,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:57:42,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 45 minutes, 54 seconds)
2025-05-13 12:01:26,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:01:39,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3728.84912 ± 848.303
2025-05-13 12:01:39,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1289.2205, 4085.564, 4105.9805, 3692.5098, 3820.1528, 3755.5454, 4568.643, 3893.5989, 3896.8345, 4180.442]
2025-05-13 12:01:39,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:01:39,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 41 minutes, 49 seconds)
2025-05-13 12:05:24,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:05:37,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3560.63330 ± 497.860
2025-05-13 12:05:37,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3835.388, 3894.7783, 4099.0654, 3975.8972, 3620.1409, 3764.226, 3489.469, 2724.3806, 2527.8398, 3675.147]
2025-05-13 12:05:37,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:05:37,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 37 minutes, 53 seconds)
2025-05-13 12:09:23,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:09:35,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3913.72339 ± 414.105
2025-05-13 12:09:35,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3801.8435, 4268.3574, 4129.1733, 3938.0671, 4156.6562, 4420.357, 3786.4988, 2846.008, 3726.3975, 4063.8765]
2025-05-13 12:09:35,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:09:35,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3913.72) for latency MM1Queue_a033_s075
2025-05-13 12:09:35,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 33 minutes, 58 seconds)
2025-05-13 12:13:20,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:13:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3937.80078 ± 283.611
2025-05-13 12:13:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3673.785, 4355.9883, 3757.957, 4135.147, 3733.9155, 3988.0322, 3981.8843, 4304.7285, 3394.824, 4051.7454]
2025-05-13 12:13:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:13:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3937.80) for latency MM1Queue_a033_s075
2025-05-13 12:13:33,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 30 minutes, 2 seconds)
2025-05-13 12:17:18,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:17:30,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3958.46240 ± 264.519
2025-05-13 12:17:30,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4351.501, 3756.9, 3925.7366, 4082.7769, 3676.4556, 4348.3423, 4127.1777, 3753.132, 4029.434, 3533.1675]
2025-05-13 12:17:30,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:17:30,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3958.46) for latency MM1Queue_a033_s075
2025-05-13 12:17:30,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 26 minutes, 3 seconds)
2025-05-13 12:21:15,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:21:28,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4053.23315 ± 161.509
2025-05-13 12:21:28,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4114.091, 3870.075, 4029.5059, 4020.2988, 4244.9937, 3871.9966, 3855.6953, 4208.9297, 4341.355, 3975.3916]
2025-05-13 12:21:28,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:21:28,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4053.23) for latency MM1Queue_a033_s075
2025-05-13 12:21:28,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 22 minutes, 8 seconds)
2025-05-13 12:25:13,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:25:25,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3618.58594 ± 483.536
2025-05-13 12:25:25,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2577.572, 3401.7715, 4010.4417, 4137.161, 3769.3118, 2923.866, 3926.4587, 3843.9575, 3984.4976, 3610.8203]
2025-05-13 12:25:25,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:25:25,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 18 minutes, 3 seconds)
2025-05-13 12:29:10,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:29:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3983.36401 ± 183.621
2025-05-13 12:29:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4230.304, 3923.9949, 3728.2737, 4085.3828, 3793.33, 4083.4539, 4235.211, 4077.5803, 3702.9263, 3973.1848]
2025-05-13 12:29:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:29:23,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 13 minutes, 57 seconds)
2025-05-13 12:33:08,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:33:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4087.19995 ± 246.206
2025-05-13 12:33:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3812.9756, 3811.4385, 4175.5327, 4612.7476, 4093.7341, 3912.7275, 4317.9995, 3926.5273, 4278.4917, 3929.8235]
2025-05-13 12:33:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:33:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4087.20) for latency MM1Queue_a033_s075
2025-05-13 12:33:20,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 9 minutes, 58 seconds)
2025-05-13 12:37:05,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:37:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3889.78662 ± 625.840
2025-05-13 12:37:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4326.4204, 4045.9624, 4684.0557, 4208.4614, 3810.218, 3815.907, 2209.743, 4033.1724, 4121.0034, 3642.9238]
2025-05-13 12:37:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:37:18,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 6 minutes, 4 seconds)
2025-05-13 12:41:03,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:41:16,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3705.99072 ± 822.287
2025-05-13 12:41:16,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3754.482, 4465.744, 4374.601, 3888.1877, 4039.994, 1366.8573, 3898.9033, 3938.2288, 3556.0847, 3776.8232]
2025-05-13 12:41:16,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:41:16,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 2 minutes, 4 seconds)
2025-05-13 12:45:00,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:45:13,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3742.92041 ± 1051.549
2025-05-13 12:45:13,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4406.2427, 4127.71, 4035.5972, 4089.102, 4108.866, 4029.3784, 2901.2378, 909.4865, 4868.58, 3952.999]
2025-05-13 12:45:13,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:45:13,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 58 minutes, 10 seconds)
2025-05-13 12:48:58,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:49:11,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4171.88818 ± 216.145
2025-05-13 12:49:11,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4046.7764, 4397.7188, 4100.3154, 3950.248, 4245.9404, 4275.102, 3915.7378, 3864.2258, 4494.861, 4427.957]
2025-05-13 12:49:11,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:49:11,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4171.89) for latency MM1Queue_a033_s075
2025-05-13 12:49:11,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 54 minutes, 14 seconds)
2025-05-13 12:52:55,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:53:08,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4221.70410 ± 308.124
2025-05-13 12:53:08,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4230.391, 4008.9766, 3617.3423, 4410.6064, 4043.4727, 4769.662, 4412.12, 3942.9912, 4386.9966, 4394.4834]
2025-05-13 12:53:08,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:53:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4221.70) for latency MM1Queue_a033_s075
2025-05-13 12:53:08,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 50 minutes, 13 seconds)
2025-05-13 12:56:53,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:57:05,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3970.62842 ± 484.242
2025-05-13 12:57:05,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4518.6646, 3647.6118, 3868.194, 4157.805, 2731.0496, 3978.0422, 3922.2761, 4127.6543, 4379.079, 4375.9087]
2025-05-13 12:57:05,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:57:05,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 46 minutes, 14 seconds)
2025-05-13 13:00:50,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:01:03,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3857.23438 ± 759.358
2025-05-13 13:01:03,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1718.4132, 4000.2212, 4343.119, 4469.6997, 3564.2734, 3887.2905, 4390.0264, 4099.796, 3881.6907, 4217.813]
2025-05-13 13:01:03,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:01:03,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 42 minutes, 18 seconds)
2025-05-13 13:04:48,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:05:01,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4068.75244 ± 337.308
2025-05-13 13:05:01,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4623.6133, 4404.22, 4047.5234, 3378.8643, 4216.5513, 4247.326, 4210.362, 3857.5837, 3968.0474, 3733.4302]
2025-05-13 13:05:01,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:05:01,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 38 minutes, 17 seconds)
2025-05-13 13:08:45,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:08:58,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3847.76709 ± 550.547
2025-05-13 13:08:58,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4414.9253, 3029.1348, 2853.2783, 4191.762, 3519.653, 4036.7659, 4465.2812, 4393.4746, 4012.4192, 3560.9783]
2025-05-13 13:08:58,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:08:58,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 34 minutes, 20 seconds)
2025-05-13 13:12:43,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:12:55,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4051.05469 ± 261.632
2025-05-13 13:12:55,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3614.4248, 4236.9683, 4236.2144, 3527.0186, 4022.1316, 3962.2275, 4130.491, 4292.4297, 4278.9746, 4209.664]
2025-05-13 13:12:55,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:12:55,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 30 minutes, 24 seconds)
2025-05-13 13:16:40,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:16:52,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4061.24731 ± 496.287
2025-05-13 13:16:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3881.4785, 2726.4185, 3850.6582, 4227.797, 4156.6465, 4499.4263, 4143.8774, 4490.428, 4480.662, 4155.0825]
2025-05-13 13:16:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:16:52,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 26 minutes, 23 seconds)
2025-05-13 13:20:37,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:20:50,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4283.92090 ± 190.173
2025-05-13 13:20:50,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4132.3647, 4286.978, 4315.3467, 4322.15, 4032.2559, 4711.498, 4078.4033, 4431.5957, 4376.6846, 4151.9316]
2025-05-13 13:20:50,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:20:50,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4283.92) for latency MM1Queue_a033_s075
2025-05-13 13:20:50,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 22 minutes, 22 seconds)
2025-05-13 13:24:34,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:24:47,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4229.10986 ± 277.442
2025-05-13 13:24:47,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4622.254, 4213.433, 4353.421, 3520.5454, 4301.0635, 4445.6294, 4070.8079, 4199.211, 4199.5044, 4365.226]
2025-05-13 13:24:47,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:24:47,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 18 minutes, 23 seconds)
2025-05-13 13:28:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:28:44,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4349.15625 ± 169.575
2025-05-13 13:28:44,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4400.8438, 4535.373, 4420.3364, 4613.8433, 4159.5166, 4367.424, 4307.1694, 4175.1226, 4467.3506, 4044.582]
2025-05-13 13:28:44,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:28:44,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4349.16) for latency MM1Queue_a033_s075
2025-05-13 13:28:44,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 14 minutes, 26 seconds)
2025-05-13 13:32:29,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:32:42,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4124.21338 ± 510.092
2025-05-13 13:32:42,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2877.0625, 3631.4048, 4430.618, 4387.2104, 4599.006, 4460.3506, 4460.348, 3833.0662, 4115.748, 4447.3203]
2025-05-13 13:32:42,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:32:42,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 10 minutes, 29 seconds)
2025-05-13 13:36:26,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:36:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4365.32520 ± 286.171
2025-05-13 13:36:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4271.046, 4389.2085, 4380.07, 4499.013, 3720.795, 4512.4106, 4940.4497, 4212.509, 4353.5728, 4374.178]
2025-05-13 13:36:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:36:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4365.33) for latency MM1Queue_a033_s075
2025-05-13 13:36:39,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 6 minutes, 34 seconds)
2025-05-13 13:40:23,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:40:36,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3943.62109 ± 497.551
2025-05-13 13:40:36,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3682.0513, 4540.387, 3952.9482, 3906.9297, 4248.589, 3673.8677, 4020.2244, 2906.5051, 3699.3142, 4805.396]
2025-05-13 13:40:36,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:40:36,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 2 minutes, 38 seconds)
2025-05-13 13:44:21,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:44:33,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4199.64453 ± 301.600
2025-05-13 13:44:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3877.1577, 4717.5635, 4252.081, 3577.5708, 4464.4224, 4328.7534, 4044.131, 4112.7935, 4357.707, 4264.2637]
2025-05-13 13:44:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:44:34,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 58 minutes, 40 seconds)
2025-05-13 13:48:18,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:48:31,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4150.01465 ± 751.296
2025-05-13 13:48:31,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4145.8706, 4844.417, 2019.9877, 4292.265, 4737.2363, 4068.2083, 4476.4785, 4511.458, 4128.5386, 4275.6865]
2025-05-13 13:48:31,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:48:31,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 54 minutes, 42 seconds)
2025-05-13 13:52:15,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:52:28,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4095.87183 ± 162.105
2025-05-13 13:52:28,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4375.857, 3989.226, 4062.0781, 4256.074, 4197.3384, 4119.1445, 4245.696, 3867.2996, 3901.6384, 3944.3718]
2025-05-13 13:52:28,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:52:28,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 50 minutes, 45 seconds)
2025-05-13 13:56:13,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:56:26,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4116.61914 ± 330.036
2025-05-13 13:56:26,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3776.4639, 4016.2793, 3705.1973, 4641.774, 4464.0854, 4255.915, 3775.401, 4566.9326, 4088.9675, 3875.174]
2025-05-13 13:56:26,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:56:26,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 46 minutes, 50 seconds)
2025-05-13 14:00:11,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:00:24,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4222.10645 ± 221.430
2025-05-13 14:00:24,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3812.679, 4564.587, 4214.838, 4170.5576, 4223.813, 4527.2695, 3912.4578, 4220.5146, 4236.195, 4338.1523]
2025-05-13 14:00:24,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:00:24,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 42 minutes, 53 seconds)
2025-05-13 14:04:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:04:21,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4204.54688 ± 212.394
2025-05-13 14:04:21,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4361.432, 4192.214, 4207.0425, 3945.3667, 4369.1914, 4223.1885, 3988.8247, 3977.6484, 4677.245, 4103.3184]
2025-05-13 14:04:21,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:04:21,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 38 minutes, 57 seconds)
2025-05-13 14:08:05,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:08:18,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4485.11865 ± 224.768
2025-05-13 14:08:18,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4778.0137, 4788.0864, 4201.18, 4378.098, 4221.9526, 4205.466, 4519.6416, 4395.6953, 4651.931, 4711.1255]
2025-05-13 14:08:18,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:08:18,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4485.12) for latency MM1Queue_a033_s075
2025-05-13 14:08:18,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 34 minutes, 56 seconds)
2025-05-13 14:12:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:12:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4309.54053 ± 243.928
2025-05-13 14:12:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4549.329, 4374.298, 3967.7224, 4662.2676, 3924.5193, 4416.4663, 4107.6694, 4136.852, 4428.3433, 4527.9375]
2025-05-13 14:12:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:12:14,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 30 minutes, 55 seconds)
2025-05-13 14:15:59,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:16:12,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4153.11768 ± 281.295
2025-05-13 14:16:12,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4212.0366, 3775.0454, 3647.2034, 3957.6936, 4221.613, 4241.8213, 4334.007, 4626.671, 4429.8433, 4085.2437]
2025-05-13 14:16:12,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:16:12,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 26 minutes, 56 seconds)
2025-05-13 14:19:56,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:20:09,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4051.88745 ± 466.312
2025-05-13 14:20:09,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4134.411, 4557.931, 3879.7563, 4061.0562, 3756.2158, 4006.5767, 2945.811, 4336.099, 4091.533, 4749.485]
2025-05-13 14:20:09,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:20:09,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 22 minutes, 57 seconds)
2025-05-13 14:23:53,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:24:06,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3953.89502 ± 853.509
2025-05-13 14:24:06,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4304.5684, 1539.1838, 4143.3364, 4309.1396, 4430.9487, 4554.9375, 4528.1445, 4121.4194, 3506.0876, 4101.182]
2025-05-13 14:24:06,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:24:06,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 18 minutes, 58 seconds)
2025-05-13 14:27:50,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:28:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4176.42188 ± 587.810
2025-05-13 14:28:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4166.982, 2686.742, 4118.255, 4672.0317, 4258.0347, 4381.0664, 3723.2507, 4872.3394, 4673.9375, 4211.5767]
2025-05-13 14:28:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:28:03,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 15 minutes, 4 seconds)
2025-05-13 14:31:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:32:00,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3748.97607 ± 719.073
2025-05-13 14:32:00,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3872.2773, 4273.494, 4481.9517, 1780.6674, 3587.0537, 4124.0786, 3780.758, 3836.207, 3515.5942, 4237.6797]
2025-05-13 14:32:00,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:32:00,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 11 minutes, 7 seconds)
2025-05-13 14:35:44,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:35:57,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3789.71802 ± 701.404
2025-05-13 14:35:57,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4465.946, 4320.3315, 3859.713, 4452.4053, 2904.4983, 3930.8704, 4163.4897, 2511.0605, 4391.5522, 2897.317]
2025-05-13 14:35:57,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:35:57,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2025-05-13 14:39:41,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:39:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3944.73486 ± 854.857
2025-05-13 14:39:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4322.436, 4215.0825, 4060.2737, 3980.1167, 4607.2554, 4041.7751, 1436.1235, 4250.3315, 4134.876, 4399.077]
2025-05-13 14:39:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:39:54,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 3 minutes, 12 seconds)
2025-05-13 14:43:38,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:43:51,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4318.62012 ± 264.451
2025-05-13 14:43:51,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4500.445, 4548.315, 4430.8086, 4478.494, 4701.296, 3756.8357, 4013.8828, 4185.587, 4301.84, 4268.701]
2025-05-13 14:43:51,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:43:51,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 59 minutes, 16 seconds)
2025-05-13 14:47:36,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:47:48,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4048.05664 ± 738.926
2025-05-13 14:47:48,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4668.1494, 4277.306, 4381.9517, 3583.4321, 4468.903, 2844.3032, 2553.8235, 4496.748, 4503.922, 4702.0283]
2025-05-13 14:47:48,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:47:48,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 55 minutes, 18 seconds)
2025-05-13 14:51:32,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:51:45,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4371.86035 ± 228.990
2025-05-13 14:51:45,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4124.811, 4608.5757, 4460.9644, 4356.7876, 4487.2485, 4074.1462, 4737.0903, 4260.062, 4037.81, 4571.108]
2025-05-13 14:51:45,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:51:45,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 51 minutes, 21 seconds)
2025-05-13 14:55:29,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:55:42,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4058.83789 ± 715.905
2025-05-13 14:55:42,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3289.1616, 4739.2695, 4035.3103, 4908.618, 4402.281, 4382.7666, 4332.8623, 4223.988, 3950.2463, 2323.8752]
2025-05-13 14:55:42,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:55:42,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 47 minutes, 23 seconds)
2025-05-13 14:59:26,138 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:59:39,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4259.52148 ± 177.939
2025-05-13 14:59:39,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3968.0737, 4210.936, 4055.2454, 4366.2075, 4173.147, 4492.976, 4548.781, 4347.547, 4118.5103, 4313.788]
2025-05-13 14:59:39,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:59:39,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 43 minutes, 26 seconds)
2025-05-13 15:03:23,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:03:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4408.60156 ± 200.717
2025-05-13 15:03:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4598.303, 4719.214, 4160.1646, 4223.772, 4465.369, 4559.872, 4513.39, 4317.6875, 4473.193, 4055.049]
2025-05-13 15:03:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:03:35,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 39 minutes, 28 seconds)
2025-05-13 15:07:19,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:07:32,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3785.94385 ± 985.709
2025-05-13 15:07:32,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4316.4067, 3539.7156, 4423.1626, 3167.2988, 1118.212, 4672.3003, 4104.0977, 4293.3887, 4348.122, 3876.734]
2025-05-13 15:07:32,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:07:32,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 35 minutes, 30 seconds)
2025-05-13 15:11:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:11:29,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4177.29932 ± 198.290
2025-05-13 15:11:29,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4112.4, 4516.171, 3992.696, 4405.5957, 3818.4795, 4262.2744, 4005.7817, 4159.373, 4326.3643, 4173.8545]
2025-05-13 15:11:29,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:11:29,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 31 minutes, 33 seconds)
2025-05-13 15:15:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:15:26,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4125.58301 ± 904.931
2025-05-13 15:15:26,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4693.3984, 4328.4673, 4471.1562, 4539.4053, 4250.003, 1522.7366, 4813.5156, 4392.8394, 4438.192, 3806.117]
2025-05-13 15:15:26,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:15:26,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 27 minutes, 37 seconds)
2025-05-13 15:19:10,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:19:22,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4384.18262 ± 257.386
2025-05-13 15:19:22,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3860.9387, 4203.831, 4138.9146, 4463.596, 4711.773, 4339.6055, 4497.4624, 4336.293, 4766.215, 4523.195]
2025-05-13 15:19:22,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:19:22,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 23 minutes, 40 seconds)
2025-05-13 15:23:06,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:23:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4436.73486 ± 169.896
2025-05-13 15:23:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4235.475, 4637.969, 4608.748, 4157.142, 4539.217, 4649.8413, 4540.4883, 4310.5576, 4349.6416, 4338.2656]
2025-05-13 15:23:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:23:19,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 43 seconds)
2025-05-13 15:27:04,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:27:16,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4378.85449 ± 155.426
2025-05-13 15:27:16,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4352.8647, 4521.847, 4603.8623, 4399.0127, 4275.7183, 4584.4253, 4176.459, 4449.528, 4125.366, 4299.4634]
2025-05-13 15:27:16,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:27:16,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 47 seconds)
2025-05-13 15:31:02,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:31:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4502.92139 ± 214.246
2025-05-13 15:31:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4034.2678, 4774.464, 4473.115, 4603.8574, 4528.009, 4600.1333, 4275.4043, 4746.8853, 4625.2295, 4367.851]
2025-05-13 15:31:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:31:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4502.92) for latency MM1Queue_a033_s075
2025-05-13 15:31:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 51 seconds)
2025-05-13 15:35:02,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:35:15,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4345.04297 ± 257.882
2025-05-13 15:35:15,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4487.0996, 4357.206, 4614.83, 4472.957, 3887.0693, 4361.8125, 4688.453, 3906.7793, 4470.1577, 4204.067]
2025-05-13 15:35:15,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:35:15,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 55 seconds)
2025-05-13 15:39:02,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:39:15,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4050.79688 ± 1007.926
2025-05-13 15:39:15,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4600.49, 4650.806, 4006.645, 1096.835, 4214.465, 4532.8257, 4276.162, 4444.3423, 4083.9321, 4601.467]
2025-05-13 15:39:15,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:39:15,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 58 seconds)
2025-05-13 15:43:01,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:43:14,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4276.62646 ± 223.520
2025-05-13 15:43:14,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3885.8687, 4517.256, 4248.9883, 4222.645, 4224.948, 4309.648, 4526.7817, 4632.324, 4219.9565, 3977.847]
2025-05-13 15:43:14,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:43:14,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1251 [DEBUG]: Training session finished
