2025-05-13 09:06:31,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mda-mem24
2025-05-13 09:06:31,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mda-mem24
2025-05-13 09:06:31,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x153227eaa590>}
2025-05-13 09:06:31,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:31,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-13 09:06:31,065 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-13 09:06:31,065 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:31,070 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2025-05-13 09:06:31,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:31,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-13 09:09:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:09:48,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 30.87459 ± 1.852
2025-05-13 09:09:48,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [34.12439, 30.809319, 29.093836, 29.541763, 28.588758, 29.532959, 32.2687, 30.89912, 33.88398, 30.003105]
2025-05-13 09:09:48,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 34.0, 33.0, 34.0, 32.0, 34.0, 35.0, 34.0, 34.0, 35.0]
2025-05-13 09:09:48,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (30.87) for latency ExtremeClogL1U23
2025-05-13 09:09:48,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 25 minutes, 20 seconds)
2025-05-13 09:13:11,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:13:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 56.35806 ± 7.382
2025-05-13 09:13:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [51.125668, 58.149303, 55.826595, 55.66662, 48.22011, 65.23957, 65.97514, 62.136826, 41.219624, 60.021156]
2025-05-13 09:13:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 52.0, 50.0, 50.0, 43.0, 56.0, 53.0, 54.0, 43.0, 52.0]
2025-05-13 09:13:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (56.36) for latency ExtremeClogL1U23
2025-05-13 09:13:12,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 27 minutes, 31 seconds)
2025-05-13 09:16:36,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:16:38,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 161.27097 ± 65.113
2025-05-13 09:16:38,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [48.046204, 245.79922, 110.50044, 140.53279, 132.29376, 126.10947, 126.78963, 253.11172, 246.40582, 183.1206]
2025-05-13 09:16:38,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 139.0, 93.0, 99.0, 95.0, 102.0, 93.0, 158.0, 138.0, 139.0]
2025-05-13 09:16:38,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (161.27) for latency ExtremeClogL1U23
2025-05-13 09:16:38,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 27 minutes, 5 seconds)
2025-05-13 09:20:01,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:20:02,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 126.54183 ± 23.761
2025-05-13 09:20:02,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [175.32971, 131.30423, 142.55908, 146.84256, 108.883545, 111.415886, 87.70372, 136.49284, 116.22395, 108.662766]
2025-05-13 09:20:02,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 96.0, 98.0, 99.0, 89.0, 80.0, 92.0, 100.0, 89.0, 80.0]
2025-05-13 09:20:02,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 24 minutes, 28 seconds)
2025-05-13 09:23:25,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:23:27,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 96.45757 ± 44.015
2025-05-13 09:23:27,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [73.57025, 56.460438, 97.291504, 165.45038, 189.99417, 83.697, 62.86377, 79.56517, 50.732166, 104.950806]
2025-05-13 09:23:27,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 53.0, 78.0, 97.0, 110.0, 58.0, 56.0, 61.0, 47.0, 77.0]
2025-05-13 09:23:27,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 21 minutes, 32 seconds)
2025-05-13 09:26:52,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:26:54,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 152.75412 ± 25.813
2025-05-13 09:26:54,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [109.854645, 148.19994, 160.49992, 128.81215, 166.35013, 184.22101, 115.2821, 153.8613, 173.95706, 186.50291]
2025-05-13 09:26:54,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 100.0, 101.0, 90.0, 114.0, 126.0, 87.0, 106.0, 100.0, 125.0]
2025-05-13 09:26:54,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 21 minutes, 21 seconds)
2025-05-13 09:30:17,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:30:19,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 159.96346 ± 56.724
2025-05-13 09:30:19,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [94.84587, 134.14838, 189.59451, 173.76366, 182.3755, 259.47397, 196.8564, 47.545113, 130.53918, 190.49194]
2025-05-13 09:30:19,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 89.0, 113.0, 122.0, 112.0, 138.0, 123.0, 44.0, 96.0, 114.0]
2025-05-13 09:30:19,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 18 minutes, 21 seconds)
2025-05-13 09:33:43,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:33:45,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 131.81993 ± 58.250
2025-05-13 09:33:45,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [104.03524, 223.17317, 105.10628, 37.684315, 93.42564, 102.380135, 168.3057, 228.02214, 94.250084, 161.81662]
2025-05-13 09:33:45,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 123.0, 77.0, 38.0, 77.0, 80.0, 104.0, 142.0, 78.0, 98.0]
2025-05-13 09:33:45,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 14 minutes, 48 seconds)
2025-05-13 09:37:09,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:37:11,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 169.72598 ± 63.496
2025-05-13 09:37:11,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [72.05971, 115.39956, 162.57797, 163.02367, 242.07529, 283.04782, 163.84714, 100.58646, 238.0448, 156.59746]
2025-05-13 09:37:11,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 75.0, 99.0, 99.0, 129.0, 139.0, 104.0, 71.0, 136.0, 96.0]
2025-05-13 09:37:11,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (169.73) for latency ExtremeClogL1U23
2025-05-13 09:37:11,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 11 minutes, 52 seconds)
2025-05-13 09:40:34,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:40:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 186.29941 ± 96.027
2025-05-13 09:40:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [101.18071, 82.30456, 21.32112, 273.66812, 196.37454, 140.22641, 320.32178, 208.06514, 322.39807, 197.13364]
2025-05-13 09:40:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 64.0, 25.0, 134.0, 125.0, 89.0, 146.0, 136.0, 157.0, 108.0]
2025-05-13 09:40:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (186.30) for latency ExtremeClogL1U23
2025-05-13 09:40:36,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 8 minutes, 46 seconds)
2025-05-13 09:44:00,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:44:01,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 128.47652 ± 74.128
2025-05-13 09:44:01,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [59.503937, 79.91531, 79.2965, 264.7939, 70.98431, 163.34087, 72.503006, 158.98676, 254.80771, 80.63286]
2025-05-13 09:44:01,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 65.0, 68.0, 138.0, 62.0, 101.0, 60.0, 98.0, 137.0, 65.0]
2025-05-13 09:44:01,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 4 minutes, 48 seconds)
2025-05-13 09:47:24,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:47:26,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 152.19353 ± 46.929
2025-05-13 09:47:26,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [234.03268, 101.96048, 102.93144, 151.236, 110.297195, 219.50943, 106.084175, 157.89822, 142.87105, 195.11455]
2025-05-13 09:47:26,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 76.0, 75.0, 113.0, 81.0, 147.0, 79.0, 110.0, 99.0, 144.0]
2025-05-13 09:47:26,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 1 minute, 14 seconds)
2025-05-13 09:50:49,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:50:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 198.76727 ± 59.072
2025-05-13 09:50:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [143.7315, 185.68483, 265.56024, 94.373955, 216.97632, 315.16983, 163.93863, 217.9496, 172.31676, 211.97108]
2025-05-13 09:50:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 107.0, 133.0, 70.0, 112.0, 143.0, 106.0, 117.0, 112.0, 124.0]
2025-05-13 09:50:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (198.77) for latency ExtremeClogL1U23
2025-05-13 09:50:51,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 57 minutes, 34 seconds)
2025-05-13 09:54:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:54:15,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 177.85460 ± 62.510
2025-05-13 09:54:15,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [165.89503, 224.30568, 157.54073, 308.90494, 162.08537, 140.17024, 160.21513, 252.2404, 85.635216, 121.5532]
2025-05-13 09:54:15,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 129.0, 113.0, 148.0, 103.0, 89.0, 93.0, 136.0, 65.0, 77.0]
2025-05-13 09:54:15,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 53 minutes, 35 seconds)
2025-05-13 09:57:37,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:57:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 261.27643 ± 55.367
2025-05-13 09:57:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [184.65283, 167.66245, 317.3381, 300.85114, 272.9377, 317.4216, 241.38141, 239.20108, 231.11311, 340.20517]
2025-05-13 09:57:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 101.0, 141.0, 136.0, 138.0, 145.0, 125.0, 135.0, 128.0, 149.0]
2025-05-13 09:57:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (261.28) for latency ExtremeClogL1U23
2025-05-13 09:57:40,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 50 minutes, 7 seconds)
2025-05-13 10:01:00,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:01:03,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 225.91096 ± 86.608
2025-05-13 10:01:03,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [466.31796, 213.53232, 183.78267, 270.3874, 145.03204, 171.27885, 211.9189, 194.7885, 227.30913, 174.76207]
2025-05-13 10:01:03,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 115.0, 112.0, 133.0, 99.0, 109.0, 125.0, 118.0, 126.0, 95.0]
2025-05-13 10:01:03,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 45 minutes, 54 seconds)
2025-05-13 10:04:28,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:04:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 244.61621 ± 73.777
2025-05-13 10:04:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [220.9828, 318.98434, 196.9541, 315.83624, 207.19435, 372.34195, 123.199135, 298.0485, 173.74101, 218.87971]
2025-05-13 10:04:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 150.0, 121.0, 174.0, 143.0, 188.0, 86.0, 165.0, 131.0, 139.0]
2025-05-13 10:04:31,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 43 minutes, 24 seconds)
2025-05-13 10:07:50,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:07:52,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 160.45337 ± 73.058
2025-05-13 10:07:52,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [75.77047, 167.0689, 77.09197, 298.56406, 175.35907, 221.15857, 109.875786, 253.71556, 97.763466, 128.16591]
2025-05-13 10:07:52,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 106.0, 55.0, 156.0, 103.0, 125.0, 69.0, 134.0, 64.0, 78.0]
2025-05-13 10:07:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 39 minutes)
2025-05-13 10:11:15,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:11:17,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 198.47643 ± 72.893
2025-05-13 10:11:17,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [172.75345, 101.78431, 191.06792, 138.33739, 216.83603, 250.18265, 187.43878, 185.44669, 156.91154, 384.00565]
2025-05-13 10:11:17,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 76.0, 107.0, 94.0, 126.0, 146.0, 109.0, 105.0, 99.0, 169.0]
2025-05-13 10:11:17,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 35 minutes, 57 seconds)
2025-05-13 10:14:37,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:14:39,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 220.63818 ± 64.620
2025-05-13 10:14:39,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [159.48419, 234.45999, 232.42432, 105.67474, 335.3068, 307.5609, 240.87627, 224.75114, 167.19055, 198.6529]
2025-05-13 10:14:39,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 117.0, 125.0, 76.0, 166.0, 135.0, 130.0, 118.0, 100.0, 111.0]
2025-05-13 10:14:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 31 minutes, 46 seconds)
2025-05-13 10:18:02,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:18:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 213.04337 ± 63.432
2025-05-13 10:18:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [268.49506, 144.36142, 156.26309, 250.98611, 212.51692, 265.90933, 182.6913, 293.79004, 90.96975, 264.4506]
2025-05-13 10:18:04,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 85.0, 96.0, 142.0, 116.0, 132.0, 96.0, 141.0, 70.0, 129.0]
2025-05-13 10:18:04,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 28 minutes, 53 seconds)
2025-05-13 10:21:26,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:21:28,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 202.09322 ± 39.525
2025-05-13 10:21:28,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [257.1221, 263.71002, 205.76373, 210.61414, 228.14365, 159.69972, 136.70876, 158.41432, 195.54068, 205.21498]
2025-05-13 10:21:28,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 141.0, 127.0, 118.0, 139.0, 101.0, 92.0, 94.0, 114.0, 113.0]
2025-05-13 10:21:28,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 24 minutes, 38 seconds)
2025-05-13 10:24:51,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:24:53,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 248.41220 ± 83.557
2025-05-13 10:24:53,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [271.52072, 330.86108, 188.58032, 170.02803, 172.75322, 118.22426, 242.81532, 376.19348, 365.58118, 247.56438]
2025-05-13 10:24:53,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 145.0, 108.0, 101.0, 102.0, 75.0, 125.0, 176.0, 153.0, 129.0]
2025-05-13 10:24:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 22 minutes, 6 seconds)
2025-05-13 10:28:14,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:28:16,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 235.41089 ± 110.519
2025-05-13 10:28:16,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [98.61622, 413.259, 378.76746, 92.919556, 299.55005, 203.16609, 86.73922, 267.91467, 227.12868, 286.04788]
2025-05-13 10:28:16,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 175.0, 167.0, 68.0, 138.0, 113.0, 64.0, 138.0, 118.0, 140.0]
2025-05-13 10:28:16,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 18 minutes, 19 seconds)
2025-05-13 10:31:39,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:31:41,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 268.49466 ± 95.193
2025-05-13 10:31:41,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [186.18126, 297.1363, 98.554985, 202.43747, 302.99536, 230.05643, 450.3995, 362.88965, 326.95346, 227.34236]
2025-05-13 10:31:41,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 138.0, 71.0, 104.0, 147.0, 117.0, 223.0, 173.0, 152.0, 121.0]
2025-05-13 10:31:41,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (268.49) for latency ExtremeClogL1U23
2025-05-13 10:31:41,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 15 minutes, 31 seconds)
2025-05-13 10:35:05,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:35:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 236.84517 ± 82.383
2025-05-13 10:35:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [311.5717, 77.39239, 125.514084, 305.8118, 286.4447, 306.01678, 158.07227, 292.96875, 216.06175, 288.59772]
2025-05-13 10:35:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 55.0, 93.0, 158.0, 145.0, 138.0, 99.0, 153.0, 113.0, 150.0]
2025-05-13 10:35:07,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 12 minutes, 24 seconds)
2025-05-13 10:38:30,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:38:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 254.79765 ± 83.963
2025-05-13 10:38:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [342.81778, 281.56934, 244.50307, 292.9241, 280.81952, 256.17465, 301.49887, 237.87807, 291.58414, 18.207148]
2025-05-13 10:38:32,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 141.0, 124.0, 141.0, 142.0, 138.0, 137.0, 120.0, 143.0, 22.0]
2025-05-13 10:38:32,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 9 minutes, 7 seconds)
2025-05-13 10:41:52,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:41:55,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 307.96466 ± 96.417
2025-05-13 10:41:55,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [249.06837, 418.62845, 244.43115, 260.22513, 180.23302, 511.45505, 363.72867, 335.87476, 304.50845, 211.49341]
2025-05-13 10:41:55,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 188.0, 130.0, 144.0, 118.0, 223.0, 168.0, 149.0, 158.0, 114.0]
2025-05-13 10:41:55,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (307.96) for latency ExtremeClogL1U23
2025-05-13 10:41:55,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 5 minutes, 21 seconds)
2025-05-13 10:45:17,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:45:19,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 145.44521 ± 21.568
2025-05-13 10:45:19,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [175.2103, 113.50173, 137.52475, 113.31932, 137.95183, 151.16718, 173.93375, 132.0297, 168.62, 151.19353]
2025-05-13 10:45:19,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 79.0, 96.0, 79.0, 92.0, 101.0, 100.0, 91.0, 107.0, 100.0]
2025-05-13 10:45:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 1 minute, 59 seconds)
2025-05-13 10:48:43,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:48:45,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 253.55205 ± 73.955
2025-05-13 10:48:45,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [180.25256, 249.5343, 441.00443, 184.31085, 268.2904, 260.26523, 208.03157, 308.04205, 245.50792, 190.28131]
2025-05-13 10:48:45,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 135.0, 184.0, 105.0, 123.0, 137.0, 134.0, 146.0, 140.0, 110.0]
2025-05-13 10:48:45,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 58 minutes, 55 seconds)
2025-05-13 10:52:08,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:52:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 177.71782 ± 26.562
2025-05-13 10:52:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [194.29666, 193.08678, 183.61101, 183.24536, 131.46509, 201.31151, 197.48584, 122.642525, 196.24814, 173.78532]
2025-05-13 10:52:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 110.0, 103.0, 104.0, 96.0, 111.0, 112.0, 85.0, 111.0, 106.0]
2025-05-13 10:52:10,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 55 minutes, 12 seconds)
2025-05-13 10:55:30,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:55:32,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 214.99080 ± 82.395
2025-05-13 10:55:32,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [184.72304, 338.6264, 183.94112, 262.19952, 326.98993, 88.62453, 240.87598, 263.13974, 93.7862, 167.00156]
2025-05-13 10:55:32,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 158.0, 108.0, 141.0, 151.0, 62.0, 127.0, 138.0, 68.0, 102.0]
2025-05-13 10:55:32,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 51 minutes, 14 seconds)
2025-05-13 10:58:56,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:58:58,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 237.13675 ± 65.240
2025-05-13 10:58:58,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [203.07603, 347.64996, 184.40233, 175.35239, 307.02182, 182.27307, 225.1342, 228.22105, 341.29916, 176.93756]
2025-05-13 10:58:58,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 163.0, 106.0, 101.0, 154.0, 103.0, 119.0, 127.0, 161.0, 104.0]
2025-05-13 10:58:58,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 48 minutes, 31 seconds)
2025-05-13 11:02:20,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:02:22,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 170.35587 ± 7.660
2025-05-13 11:02:22,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [179.05528, 170.9443, 166.54263, 157.61488, 176.50153, 173.8209, 180.90237, 170.43729, 157.20897, 170.53044]
2025-05-13 11:02:22,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 99.0, 100.0, 95.0, 106.0, 105.0, 107.0, 102.0, 94.0, 93.0]
2025-05-13 11:02:22,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 45 minutes, 1 second)
2025-05-13 11:05:46,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:05:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 219.00671 ± 87.831
2025-05-13 11:05:48,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [278.90826, 305.63608, 133.74507, 86.50537, 367.06525, 197.46513, 93.753296, 263.99634, 251.67345, 211.3189]
2025-05-13 11:05:48,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 143.0, 85.0, 62.0, 176.0, 107.0, 64.0, 128.0, 125.0, 106.0]
2025-05-13 11:05:48,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 41 minutes, 37 seconds)
2025-05-13 11:09:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:09:11,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 236.03354 ± 85.046
2025-05-13 11:09:11,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [286.57825, 168.70908, 181.06331, 352.64026, 210.03546, 191.16638, 153.7085, 425.1596, 203.50967, 187.76486]
2025-05-13 11:09:11,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 98.0, 100.0, 181.0, 118.0, 105.0, 90.0, 185.0, 110.0, 99.0]
2025-05-13 11:09:11,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 37 minutes, 47 seconds)
2025-05-13 11:12:36,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:12:38,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 230.70329 ± 117.925
2025-05-13 11:12:38,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [22.052591, 400.04758, 160.83218, 184.00308, 188.35208, 255.58435, 374.06305, 308.44632, 85.09043, 328.56137]
2025-05-13 11:12:38,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 184.0, 98.0, 107.0, 113.0, 141.0, 193.0, 152.0, 61.0, 168.0]
2025-05-13 11:12:38,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 35 minutes, 27 seconds)
2025-05-13 11:15:58,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:16:01,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 241.10234 ± 73.123
2025-05-13 11:16:01,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [119.485695, 152.42834, 198.353, 296.74704, 312.59753, 194.50383, 374.3898, 268.05942, 251.82343, 242.63538]
2025-05-13 11:16:01,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 105.0, 120.0, 141.0, 153.0, 111.0, 175.0, 139.0, 128.0, 142.0]
2025-05-13 11:16:01,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 31 minutes, 16 seconds)
2025-05-13 11:19:23,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:19:25,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 267.92719 ± 97.375
2025-05-13 11:19:25,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [276.46765, 191.78285, 403.58658, 483.72086, 195.63669, 196.21817, 302.3243, 186.1716, 194.61067, 248.75279]
2025-05-13 11:19:25,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 115.0, 185.0, 214.0, 108.0, 113.0, 145.0, 110.0, 113.0, 130.0]
2025-05-13 11:19:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 28 minutes, 2 seconds)
2025-05-13 11:22:48,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:22:51,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 353.06323 ± 117.598
2025-05-13 11:22:51,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [410.16718, 207.6189, 479.56613, 389.948, 212.75554, 256.83118, 338.6052, 274.41296, 360.4435, 600.2833]
2025-05-13 11:22:51,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 121.0, 208.0, 176.0, 119.0, 130.0, 169.0, 131.0, 156.0, 255.0]
2025-05-13 11:22:51,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (353.06) for latency ExtremeClogL1U23
2025-05-13 11:22:51,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 24 minutes, 33 seconds)
2025-05-13 11:26:12,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:26:14,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 250.60538 ± 90.432
2025-05-13 11:26:14,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [322.26413, 185.71649, 321.2633, 204.35248, 146.64432, 281.66846, 334.23087, 221.77205, 395.0988, 93.0426]
2025-05-13 11:26:14,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 102.0, 158.0, 116.0, 95.0, 146.0, 151.0, 111.0, 173.0, 65.0]
2025-05-13 11:26:14,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 21 minutes, 21 seconds)
2025-05-13 11:29:40,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:29:42,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 225.25223 ± 127.810
2025-05-13 11:29:42,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [208.88351, 167.61838, 117.36221, 568.2084, 285.37805, 262.37708, 86.696884, 207.18755, 154.20673, 194.60358]
2025-05-13 11:29:42,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 100.0, 84.0, 248.0, 146.0, 141.0, 59.0, 118.0, 97.0, 108.0]
2025-05-13 11:29:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 17 minutes, 55 seconds)
2025-05-13 11:33:02,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:33:04,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 281.66251 ± 75.011
2025-05-13 11:33:04,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [364.8286, 385.26837, 288.6002, 351.5281, 197.427, 319.70697, 326.7278, 187.34319, 191.20074, 203.99419]
2025-05-13 11:33:04,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 182.0, 152.0, 164.0, 112.0, 158.0, 162.0, 107.0, 107.0, 112.0]
2025-05-13 11:33:04,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 14 minutes, 27 seconds)
2025-05-13 11:36:29,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:36:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 306.76279 ± 94.755
2025-05-13 11:36:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [265.8469, 203.85953, 365.99167, 397.4931, 217.36517, 488.54907, 210.43489, 213.0784, 387.88095, 317.12842]
2025-05-13 11:36:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 126.0, 166.0, 178.0, 122.0, 229.0, 120.0, 120.0, 180.0, 156.0]
2025-05-13 11:36:32,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 11 minutes, 38 seconds)
2025-05-13 11:39:53,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:39:56,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 282.55249 ± 122.294
2025-05-13 11:39:56,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [201.70226, 394.40472, 119.902916, 213.95912, 320.07703, 513.2261, 279.8776, 248.6561, 413.63037, 120.088905]
2025-05-13 11:39:56,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 190.0, 82.0, 118.0, 152.0, 218.0, 133.0, 130.0, 185.0, 84.0]
2025-05-13 11:39:56,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 7 minutes, 57 seconds)
2025-05-13 11:43:18,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:43:20,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 254.70273 ± 104.681
2025-05-13 11:43:20,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [173.9583, 368.09003, 217.3978, 172.45572, 192.09583, 259.49673, 282.25903, 191.44017, 513.83716, 175.99655]
2025-05-13 11:43:20,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 185.0, 116.0, 110.0, 107.0, 146.0, 145.0, 112.0, 214.0, 105.0]
2025-05-13 11:43:20,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 4 minutes, 40 seconds)
2025-05-13 11:46:42,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:46:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 307.43008 ± 132.832
2025-05-13 11:46:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [289.08276, 199.87338, 260.07327, 273.13498, 257.72763, 268.0795, 258.26276, 248.42587, 323.704, 695.9365]
2025-05-13 11:46:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 117.0, 140.0, 130.0, 136.0, 136.0, 129.0, 131.0, 174.0, 301.0]
2025-05-13 11:46:45,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 38 seconds)
2025-05-13 11:50:08,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:50:11,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 247.77734 ± 113.077
2025-05-13 11:50:11,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [169.84915, 186.974, 92.29016, 274.41907, 238.93558, 159.35997, 194.582, 365.80707, 289.26047, 506.2959]
2025-05-13 11:50:11,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 109.0, 70.0, 145.0, 131.0, 97.0, 111.0, 183.0, 146.0, 266.0]
2025-05-13 11:50:11,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2025-05-13 11:53:34,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:53:37,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 295.73102 ± 127.328
2025-05-13 11:53:37,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [172.30905, 428.55945, 125.14856, 382.9522, 213.42587, 501.28372, 253.23749, 459.63507, 222.10869, 198.65012]
2025-05-13 11:53:37,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 197.0, 84.0, 179.0, 124.0, 212.0, 138.0, 190.0, 126.0, 113.0]
2025-05-13 11:53:37,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 54 minutes, 17 seconds)
2025-05-13 11:56:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:57:01,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 315.18185 ± 120.717
2025-05-13 11:57:01,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [457.76147, 318.71512, 411.56155, 323.04868, 332.7675, 195.06976, 417.1639, 174.59177, 441.0783, 80.060555]
2025-05-13 11:57:01,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 150.0, 186.0, 166.0, 153.0, 114.0, 180.0, 110.0, 199.0, 59.0]
2025-05-13 11:57:01,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 50 minutes, 47 seconds)
2025-05-13 12:00:23,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:00:26,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 334.11615 ± 105.051
2025-05-13 12:00:26,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [224.58862, 502.3815, 391.7112, 430.8993, 238.94739, 371.24802, 208.26625, 439.52274, 192.92479, 340.67197]
2025-05-13 12:00:26,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 215.0, 170.0, 189.0, 118.0, 171.0, 110.0, 190.0, 109.0, 160.0]
2025-05-13 12:00:26,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 47 minutes, 31 seconds)
2025-05-13 12:03:48,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:03:51,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 325.18301 ± 80.752
2025-05-13 12:03:51,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [368.94763, 347.61783, 196.33345, 204.84622, 391.25247, 378.97046, 283.51703, 395.6246, 249.7303, 434.9903]
2025-05-13 12:03:51,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 165.0, 110.0, 113.0, 192.0, 162.0, 144.0, 175.0, 146.0, 183.0]
2025-05-13 12:03:51,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 44 minutes, 13 seconds)
2025-05-13 12:07:16,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:07:18,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 265.06735 ± 104.768
2025-05-13 12:07:18,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [316.0129, 478.45123, 143.49124, 191.33704, 205.47746, 148.52852, 322.06375, 182.95831, 383.8796, 278.4736]
2025-05-13 12:07:18,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 213.0, 90.0, 107.0, 112.0, 89.0, 151.0, 103.0, 167.0, 135.0]
2025-05-13 12:07:18,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 40 minutes, 55 seconds)
2025-05-13 12:10:41,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:10:44,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 353.08813 ± 123.448
2025-05-13 12:10:44,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [346.7887, 391.5865, 367.33203, 231.85396, 568.08954, 481.00897, 480.5183, 245.86841, 163.767, 254.06798]
2025-05-13 12:10:44,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 168.0, 164.0, 132.0, 237.0, 203.0, 200.0, 126.0, 99.0, 133.0]
2025-05-13 12:10:44,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (353.09) for latency ExtremeClogL1U23
2025-05-13 12:10:44,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 37 minutes, 27 seconds)
2025-05-13 12:14:03,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:14:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 247.59109 ± 59.602
2025-05-13 12:14:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [247.39778, 228.63759, 262.25372, 149.33984, 272.4653, 279.4626, 346.24274, 155.51805, 315.6504, 218.94292]
2025-05-13 12:14:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 116.0, 125.0, 91.0, 131.0, 137.0, 158.0, 99.0, 159.0, 114.0]
2025-05-13 12:14:06,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 33 minutes, 44 seconds)
2025-05-13 12:17:30,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:17:32,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 344.65656 ± 102.086
2025-05-13 12:17:32,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [442.98248, 426.03998, 223.4294, 450.6256, 385.57513, 335.7379, 194.62154, 473.36066, 311.11996, 203.07309]
2025-05-13 12:17:32,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 184.0, 121.0, 198.0, 182.0, 148.0, 107.0, 202.0, 147.0, 133.0]
2025-05-13 12:17:32,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 30 minutes, 32 seconds)
2025-05-13 12:20:55,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:20:58,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 275.66595 ± 111.040
2025-05-13 12:20:58,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [213.47119, 353.328, 446.84332, 187.8555, 475.17355, 225.28668, 196.70583, 329.53174, 170.29498, 158.16864]
2025-05-13 12:20:58,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 162.0, 195.0, 108.0, 200.0, 116.0, 110.0, 151.0, 101.0, 94.0]
2025-05-13 12:20:58,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 27 minutes, 9 seconds)
2025-05-13 12:24:21,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:24:23,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 279.63791 ± 86.059
2025-05-13 12:24:23,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [390.9544, 202.24208, 211.9789, 215.56454, 208.95474, 199.38208, 248.12279, 326.28787, 345.94153, 446.9505]
2025-05-13 12:24:23,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 109.0, 122.0, 126.0, 114.0, 110.0, 133.0, 150.0, 180.0, 209.0]
2025-05-13 12:24:23,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 23 minutes, 32 seconds)
2025-05-13 12:27:45,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:27:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 367.20645 ± 77.442
2025-05-13 12:27:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [202.78838, 373.83392, 376.65982, 367.7448, 365.516, 311.15073, 528.95374, 425.45844, 360.322, 359.63638]
2025-05-13 12:27:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 164.0, 170.0, 160.0, 163.0, 145.0, 229.0, 178.0, 163.0, 158.0]
2025-05-13 12:27:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (367.21) for latency ExtremeClogL1U23
2025-05-13 12:27:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 19 minutes, 57 seconds)
2025-05-13 12:31:12,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:31:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 298.52127 ± 125.170
2025-05-13 12:31:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [491.45154, 422.52695, 152.45715, 254.9741, 339.16138, 184.34967, 327.3356, 475.96548, 172.18088, 164.80989]
2025-05-13 12:31:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 192.0, 96.0, 138.0, 153.0, 105.0, 156.0, 205.0, 98.0, 94.0]
2025-05-13 12:31:14,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 17 minutes, 11 seconds)
2025-05-13 12:34:35,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:34:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 348.81354 ± 100.043
2025-05-13 12:34:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [286.4137, 302.98334, 561.42615, 415.13666, 307.0561, 195.88153, 445.79608, 257.79657, 330.39117, 385.254]
2025-05-13 12:34:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 151.0, 235.0, 177.0, 154.0, 110.0, 199.0, 139.0, 150.0, 174.0]
2025-05-13 12:34:38,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 13 minutes, 18 seconds)
2025-05-13 12:38:03,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:38:06,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 284.98199 ± 124.605
2025-05-13 12:38:06,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [405.16214, 377.15698, 182.6749, 291.91125, 213.38283, 128.62303, 524.15924, 178.70897, 382.40765, 165.63304]
2025-05-13 12:38:06,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 178.0, 110.0, 145.0, 121.0, 80.0, 226.0, 98.0, 175.0, 99.0]
2025-05-13 12:38:06,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 10 minutes, 10 seconds)
2025-05-13 12:41:29,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:41:31,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 272.64420 ± 167.633
2025-05-13 12:41:31,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [178.31847, 166.73834, 153.90656, 196.35141, 650.68695, 382.32376, 171.21545, 500.30133, 162.35548, 164.24405]
2025-05-13 12:41:31,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 101.0, 91.0, 113.0, 268.0, 176.0, 98.0, 226.0, 96.0, 103.0]
2025-05-13 12:41:31,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 6 minutes, 45 seconds)
2025-05-13 12:44:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:44:56,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 336.45233 ± 192.282
2025-05-13 12:44:56,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [117.92912, 262.14322, 729.101, 132.53943, 167.04787, 370.73816, 596.72845, 208.41566, 367.9884, 411.89185]
2025-05-13 12:44:56,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 132.0, 288.0, 90.0, 110.0, 171.0, 257.0, 116.0, 173.0, 184.0]
2025-05-13 12:44:56,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 3 minutes, 23 seconds)
2025-05-13 12:48:19,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:48:22,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 363.66772 ± 177.645
2025-05-13 12:48:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [108.7569, 379.21686, 428.34955, 587.1796, 595.532, 413.9382, 425.84158, 468.26883, 139.75497, 89.83891]
2025-05-13 12:48:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 169.0, 183.0, 237.0, 240.0, 173.0, 186.0, 202.0, 89.0, 67.0]
2025-05-13 12:48:22,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 59 minutes, 49 seconds)
2025-05-13 12:51:43,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:51:46,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 288.20486 ± 91.014
2025-05-13 12:51:46,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [385.11862, 185.26901, 332.66608, 417.90427, 187.12811, 218.21436, 213.40962, 374.39856, 373.0573, 194.88263]
2025-05-13 12:51:46,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 106.0, 157.0, 179.0, 109.0, 121.0, 122.0, 172.0, 167.0, 114.0]
2025-05-13 12:51:46,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 56 minutes, 28 seconds)
2025-05-13 12:55:13,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:55:16,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 354.54257 ± 177.641
2025-05-13 12:55:16,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [270.07693, 156.27914, 204.69547, 697.3726, 242.62843, 483.8268, 279.42865, 407.92636, 193.05902, 610.1323]
2025-05-13 12:55:16,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 94.0, 117.0, 272.0, 127.0, 221.0, 136.0, 181.0, 116.0, 271.0]
2025-05-13 12:55:16,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 53 minutes, 20 seconds)
2025-05-13 12:58:34,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:58:37,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 267.55209 ± 129.098
2025-05-13 12:58:37,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [430.7331, 509.7804, 437.2953, 200.6939, 192.05832, 154.769, 145.67404, 180.42873, 189.11804, 234.97034]
2025-05-13 12:58:37,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 276.0, 203.0, 110.0, 115.0, 97.0, 103.0, 108.0, 116.0, 129.0]
2025-05-13 12:58:37,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 49 minutes, 27 seconds)
2025-05-13 13:02:03,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:02:05,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 292.83469 ± 98.975
2025-05-13 13:02:05,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [165.18683, 355.7041, 350.78232, 360.4025, 248.07753, 184.5227, 445.33472, 181.58951, 220.59648, 416.1504]
2025-05-13 13:02:05,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 160.0, 166.0, 183.0, 129.0, 116.0, 194.0, 106.0, 135.0, 184.0]
2025-05-13 13:02:05,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 46 minutes, 21 seconds)
2025-05-13 13:05:29,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:05:31,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 258.38898 ± 125.129
2025-05-13 13:05:31,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [478.4873, 214.1348, 168.81024, 498.47134, 159.06589, 238.59, 246.33945, 206.77708, 279.17267, 94.04094]
2025-05-13 13:05:31,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 121.0, 97.0, 223.0, 107.0, 122.0, 122.0, 116.0, 150.0, 70.0]
2025-05-13 13:05:31,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 42 minutes, 59 seconds)
2025-05-13 13:08:50,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:08:53,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 322.33740 ± 105.682
2025-05-13 13:08:53,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [536.89935, 395.6274, 206.55058, 279.28662, 203.80981, 364.09286, 404.40112, 253.20932, 201.4635, 378.03308]
2025-05-13 13:08:53,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 184.0, 108.0, 141.0, 115.0, 169.0, 171.0, 128.0, 115.0, 173.0]
2025-05-13 13:08:53,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 39 minutes, 19 seconds)
2025-05-13 13:12:18,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:12:21,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 329.38776 ± 102.180
2025-05-13 13:12:21,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [466.80756, 283.0149, 332.9587, 189.75235, 280.34866, 293.7067, 542.8721, 326.3101, 364.37192, 213.73492]
2025-05-13 13:12:21,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 143.0, 161.0, 108.0, 142.0, 152.0, 240.0, 160.0, 163.0, 117.0]
2025-05-13 13:12:21,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 35 minutes, 41 seconds)
2025-05-13 13:15:44,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:15:46,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 262.03226 ± 139.993
2025-05-13 13:15:46,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [165.88081, 358.38882, 161.32924, 611.0551, 182.48627, 168.13232, 386.03476, 168.97191, 227.61874, 190.42456]
2025-05-13 13:15:46,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 179.0, 94.0, 255.0, 103.0, 103.0, 187.0, 101.0, 127.0, 109.0]
2025-05-13 13:15:46,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 32 minutes, 37 seconds)
2025-05-13 13:19:09,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:19:12,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 254.70560 ± 73.254
2025-05-13 13:19:12,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [248.89804, 175.77179, 365.3509, 159.27094, 361.67352, 216.1776, 262.50684, 214.48857, 345.21588, 197.70201]
2025-05-13 13:19:12,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 102.0, 171.0, 104.0, 166.0, 120.0, 130.0, 109.0, 152.0, 114.0]
2025-05-13 13:19:12,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 28 minutes, 56 seconds)
2025-05-13 13:22:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:22:36,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 295.26080 ± 132.780
2025-05-13 13:22:36,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [210.73004, 347.37125, 148.27202, 191.84262, 199.16629, 209.16844, 210.77702, 439.5548, 436.85263, 558.87286]
2025-05-13 13:22:36,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 160.0, 97.0, 110.0, 117.0, 127.0, 120.0, 188.0, 198.0, 214.0]
2025-05-13 13:22:36,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 25 minutes, 22 seconds)
2025-05-13 13:25:59,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:26:02,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 326.00381 ± 125.694
2025-05-13 13:26:02,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [443.75256, 457.6539, 171.00435, 398.24045, 201.28967, 176.09477, 506.4705, 382.41165, 173.61697, 349.5033]
2025-05-13 13:26:02,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 198.0, 93.0, 177.0, 115.0, 109.0, 213.0, 169.0, 110.0, 168.0]
2025-05-13 13:26:02,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 22 minutes, 18 seconds)
2025-05-13 13:29:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:29:26,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 428.04166 ± 100.642
2025-05-13 13:29:26,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [466.276, 487.17456, 358.10022, 313.28882, 487.5291, 349.67175, 388.18192, 673.78986, 406.8359, 349.56833]
2025-05-13 13:29:26,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 204.0, 173.0, 151.0, 214.0, 173.0, 167.0, 259.0, 179.0, 160.0]
2025-05-13 13:29:26,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (428.04) for latency ExtremeClogL1U23
2025-05-13 13:29:26,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 18 minutes, 35 seconds)
2025-05-13 13:32:49,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:32:53,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 409.78555 ± 105.521
2025-05-13 13:32:53,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [420.20013, 270.23834, 375.17236, 409.96347, 306.25928, 674.71216, 433.2026, 366.9016, 361.2678, 479.9378]
2025-05-13 13:32:53,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 132.0, 170.0, 189.0, 143.0, 253.0, 175.0, 166.0, 170.0, 216.0]
2025-05-13 13:32:53,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 15 minutes, 15 seconds)
2025-05-13 13:36:16,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:36:18,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 189.08998 ± 40.162
2025-05-13 13:36:18,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [168.21634, 163.52814, 188.68472, 165.3634, 184.28624, 158.71169, 304.62234, 175.80339, 189.9831, 191.70032]
2025-05-13 13:36:18,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 101.0, 108.0, 98.0, 100.0, 95.0, 152.0, 101.0, 104.0, 110.0]
2025-05-13 13:36:18,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 11 minutes, 49 seconds)
2025-05-13 13:39:41,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:39:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 339.10443 ± 106.762
2025-05-13 13:39:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [524.3114, 279.6337, 172.01764, 503.89755, 344.94818, 317.86322, 301.32904, 384.4296, 209.12354, 353.49078]
2025-05-13 13:39:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 143.0, 100.0, 214.0, 164.0, 151.0, 151.0, 177.0, 116.0, 165.0]
2025-05-13 13:39:44,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 8 minutes, 30 seconds)
2025-05-13 13:43:06,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:43:09,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 350.58380 ± 121.139
2025-05-13 13:43:09,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [239.19316, 305.20746, 495.88065, 243.72505, 256.91406, 234.42667, 620.9446, 333.5776, 413.4924, 362.4763]
2025-05-13 13:43:09,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 151.0, 204.0, 134.0, 134.0, 149.0, 253.0, 162.0, 203.0, 170.0]
2025-05-13 13:43:09,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 5 minutes, 3 seconds)
2025-05-13 13:46:31,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:46:33,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 312.51712 ± 92.732
2025-05-13 13:46:33,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [198.11238, 277.76047, 274.38657, 368.9386, 416.9809, 275.22504, 185.50874, 390.91223, 483.06552, 254.28069]
2025-05-13 13:46:33,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 141.0, 138.0, 172.0, 181.0, 133.0, 108.0, 173.0, 203.0, 130.0]
2025-05-13 13:46:33,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 1 minute, 37 seconds)
2025-05-13 13:49:58,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:50:00,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 323.73734 ± 165.081
2025-05-13 13:50:00,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [466.07397, 156.41083, 338.91397, 639.90924, 164.93927, 159.6882, 231.8401, 558.16125, 277.91602, 243.52034]
2025-05-13 13:50:00,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 93.0, 144.0, 247.0, 96.0, 97.0, 124.0, 198.0, 131.0, 134.0]
2025-05-13 13:50:00,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 58 minutes, 14 seconds)
2025-05-13 13:53:22,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:53:25,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 364.46017 ± 122.495
2025-05-13 13:53:25,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [339.77405, 196.1953, 248.7699, 602.91516, 396.1301, 436.82007, 462.99222, 369.09787, 408.93024, 182.97675]
2025-05-13 13:53:25,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 123.0, 129.0, 228.0, 171.0, 198.0, 198.0, 160.0, 194.0, 110.0]
2025-05-13 13:53:25,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 54 minutes, 46 seconds)
2025-05-13 13:56:48,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:56:51,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 320.48819 ± 122.903
2025-05-13 13:56:51,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [406.23697, 111.86116, 354.1824, 120.43424, 366.33405, 443.53076, 411.42932, 186.31221, 434.84238, 369.71857]
2025-05-13 13:56:51,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 85.0, 157.0, 89.0, 165.0, 192.0, 188.0, 114.0, 189.0, 176.0]
2025-05-13 13:56:51,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 51 minutes, 22 seconds)
2025-05-13 14:00:14,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:00:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 293.40814 ± 109.274
2025-05-13 14:00:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [406.56213, 155.25099, 223.4868, 456.19485, 395.23438, 115.361465, 194.7902, 340.01233, 310.57333, 336.61496]
2025-05-13 14:00:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 97.0, 127.0, 212.0, 187.0, 76.0, 114.0, 157.0, 149.0, 161.0]
2025-05-13 14:00:16,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 47 minutes, 55 seconds)
2025-05-13 14:03:40,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:03:43,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 353.96417 ± 138.812
2025-05-13 14:03:43,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [293.73462, 128.61281, 214.59996, 319.4486, 598.4539, 231.41614, 523.81525, 359.54105, 414.3525, 455.66672]
2025-05-13 14:03:43,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 82.0, 113.0, 145.0, 242.0, 121.0, 201.0, 164.0, 179.0, 196.0]
2025-05-13 14:03:43,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 44 minutes, 37 seconds)
2025-05-13 14:07:05,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:07:08,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 365.09512 ± 104.304
2025-05-13 14:07:08,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [267.56265, 382.72668, 498.7492, 227.7511, 440.4364, 454.24197, 517.80884, 321.15976, 218.3962, 322.11832]
2025-05-13 14:07:08,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 176.0, 202.0, 124.0, 190.0, 207.0, 214.0, 159.0, 124.0, 155.0]
2025-05-13 14:07:08,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 41 minutes, 5 seconds)
2025-05-13 14:10:29,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:10:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 340.79602 ± 112.940
2025-05-13 14:10:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [461.19516, 160.47246, 369.5812, 373.60132, 348.39124, 173.96506, 294.9279, 448.8956, 515.7715, 261.1589]
2025-05-13 14:10:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 97.0, 164.0, 170.0, 163.0, 112.0, 140.0, 195.0, 211.0, 134.0]
2025-05-13 14:10:32,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 37 minutes, 39 seconds)
2025-05-13 14:13:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:13:58,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 399.68048 ± 153.064
2025-05-13 14:13:58,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [416.0847, 423.45673, 660.30646, 315.80347, 191.52896, 389.80032, 267.94913, 265.7822, 681.7269, 384.36566]
2025-05-13 14:13:58,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [184.0, 187.0, 262.0, 149.0, 116.0, 178.0, 155.0, 129.0, 306.0, 174.0]
2025-05-13 14:13:58,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 34 minutes, 13 seconds)
2025-05-13 14:17:22,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:17:25,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 379.11774 ± 141.933
2025-05-13 14:17:25,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [644.5455, 451.5573, 378.648, 237.00188, 403.76077, 236.77081, 257.82147, 216.116, 377.51813, 587.43774]
2025-05-13 14:17:25,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 200.0, 179.0, 122.0, 176.0, 120.0, 133.0, 122.0, 185.0, 230.0]
2025-05-13 14:17:25,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 51 seconds)
2025-05-13 14:20:49,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:20:52,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 369.15100 ± 177.373
2025-05-13 14:20:52,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [363.0755, 230.44315, 363.99908, 597.7041, 473.207, 730.306, 145.40135, 316.48755, 154.14818, 316.73782]
2025-05-13 14:20:52,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 130.0, 159.0, 232.0, 203.0, 291.0, 101.0, 148.0, 99.0, 156.0]
2025-05-13 14:20:52,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 27 minutes, 25 seconds)
2025-05-13 14:24:13,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:24:16,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 294.50034 ± 110.757
2025-05-13 14:24:16,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [344.09494, 394.48758, 466.65695, 482.04703, 186.68726, 220.65865, 223.29353, 180.30096, 216.28992, 230.4869]
2025-05-13 14:24:16,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 181.0, 197.0, 205.0, 108.0, 124.0, 116.0, 105.0, 120.0, 116.0]
2025-05-13 14:24:16,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 59 seconds)
2025-05-13 14:27:37,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:27:40,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 313.84491 ± 127.015
2025-05-13 14:27:40,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [225.14774, 146.75864, 390.86868, 136.64146, 226.64159, 401.05585, 327.23483, 483.30493, 521.06604, 279.72922]
2025-05-13 14:27:40,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 90.0, 171.0, 80.0, 123.0, 184.0, 152.0, 204.0, 217.0, 141.0]
2025-05-13 14:27:40,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 33 seconds)
2025-05-13 14:31:05,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:31:07,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 259.23285 ± 70.367
2025-05-13 14:31:07,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [239.19864, 352.3973, 231.75938, 296.46594, 201.00623, 285.50488, 324.06488, 266.82687, 303.24615, 91.85833]
2025-05-13 14:31:07,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 164.0, 123.0, 144.0, 109.0, 139.0, 150.0, 130.0, 146.0, 71.0]
2025-05-13 14:31:07,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 17 minutes, 9 seconds)
2025-05-13 14:34:30,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:34:32,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 209.21419 ± 108.105
2025-05-13 14:34:32,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [92.49683, 320.73526, 269.1725, 177.72418, 374.79834, 125.9899, 361.06824, 95.77837, 189.97504, 84.40319]
2025-05-13 14:34:32,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 148.0, 144.0, 101.0, 166.0, 79.0, 164.0, 68.0, 106.0, 60.0]
2025-05-13 14:34:32,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 41 seconds)
2025-05-13 14:37:53,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:37:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 364.28793 ± 148.558
2025-05-13 14:37:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [199.82744, 419.77982, 335.68, 422.50488, 377.81055, 413.64398, 339.48624, 693.47894, 91.27154, 349.3961]
2025-05-13 14:37:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 179.0, 158.0, 185.0, 166.0, 181.0, 152.0, 243.0, 68.0, 160.0]
2025-05-13 14:37:56,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 14 seconds)
2025-05-13 14:41:19,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:41:22,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 325.33563 ± 160.891
2025-05-13 14:41:22,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [247.16188, 216.8426, 693.48627, 226.8651, 455.5981, 466.67496, 366.9567, 226.0338, 216.39233, 137.34473]
2025-05-13 14:41:22,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 135.0, 268.0, 117.0, 202.0, 204.0, 167.0, 118.0, 113.0, 86.0]
2025-05-13 14:41:22,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 50 seconds)
2025-05-13 14:44:45,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:44:49,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 465.13629 ± 138.863
2025-05-13 14:44:49,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [365.18506, 608.2877, 408.11176, 551.19055, 766.5896, 244.77362, 400.92746, 381.0398, 485.5256, 439.73148]
2025-05-13 14:44:49,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 237.0, 176.0, 223.0, 299.0, 132.0, 175.0, 158.0, 206.0, 184.0]
2025-05-13 14:44:49,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1226 [INFO]: New best (465.14) for latency ExtremeClogL1U23
2025-05-13 14:44:49,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 25 seconds)
2025-05-13 14:48:14,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:48:17,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1221 [DEBUG]: Total Reward: 393.01068 ± 150.073
2025-05-13 14:48:17,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1222 [DEBUG]: All rewards: [194.196, 418.7682, 423.5385, 219.2972, 356.73956, 425.66916, 763.4696, 282.7523, 436.30518, 409.3711]
2025-05-13 14:48:17,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 187.0, 185.0, 116.0, 161.0, 182.0, 315.0, 139.0, 195.0, 173.0]
2025-05-13 14:48:17,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1251 [DEBUG]: Training session finished
