PROFILING HISTORY
This is pretty much the same setup (20 workers with 20 envs each) training on VizDoom Battle environment
with widescreen rendering (this is important!), so not the fastest possible setup. What matters is consistency, helps to find performance regressions.

WITHOUT TRAINING:
[2019-11-27 22:06:02,056] Gpu learner timing: init: 3.1058, work: 0.0001
[2019-11-27 22:06:02,059] Gpu worker timing: init: 2.7746, deserialize: 4.6964, to_device: 3.8011, forward: 14.2683, serialize: 8.4691, postprocess: 9.8058, policy_step: 32.8482, weight_update: 0.0005, gpu_waiting: 2.0623
[2019-11-27 22:06:02,065] Gpu worker timing: init: 5.4169, deserialize: 3.6640, to_device: 3.1592, forward: 13.2836, serialize: 6.3964, postprocess: 7.6095, policy_step: 27.9706, weight_update: 0.0005, gpu_waiting: 1.8249
[2019-11-27 22:06:02,067] Env runner 0: timing waiting: 0.8708, reset: 27.0515, save_policy_outputs: 0.0006, env_step: 26.0700, finalize: 0.3460, overhead: 1.1313, format_output: 0.3095, one_step: 0.0272, work: 36.5773
[2019-11-27 22:06:02,079] Env runner 1: timing waiting: 0.8468, reset: 26.8022, save_policy_outputs: 0.0008, env_step: 26.1251, finalize: 0.3565, overhead: 1.1361, format_output: 0.3224, one_step: 0.0269, work: 36.6139

WITH TRAINING 1 epoch:
[2019-11-27 22:24:20,590] Gpu worker timing: init: 2.9078, deserialize: 5.5495, to_device: 5.6693, forward: 15.7285, serialize: 10.0113, postprocess: 13.4533, policy_step: 40.7373, weight_update: 0.0007, gpu_waiting: 2.0482
[2019-11-27 22:24:20,596] Gpu worker timing: init: 4.8333, deserialize: 4.6056, to_device: 5.0975, forward: 14.8585, serialize: 8.0576, postprocess: 11.3531, policy_step: 36.2226, weight_update: 0.0006, gpu_waiting: 1.9836
[2019-11-27 22:24:20,606] Env runner 1: timing waiting: 0.9328, reset: 27.9299, save_policy_outputs: 0.0005, env_step: 31.6222, finalize: 0.4432, overhead: 1.3904, format_output: 0.3692, one_step: 0.0309, work: 44.7151
[2019-11-27 22:24:20,622] Env runner 0: timing waiting: 1.0276, reset: 27.5389, save_policy_outputs: 0.0009, env_step: 31.5377, finalize: 0.4614, overhead: 1.4103, format_output: 0.3564, one_step: 0.0269, work: 44.6398
[2019-11-27 22:24:23,072] Gpu learner timing: init: 3.3635, last_values: 0.4506, gae: 3.5159, numpy: 0.6232, finalize: 4.6129, buffer: 6.4776, update: 16.3922, train: 26.0528, work: 37.2159
[2019-11-27 22:24:52,618] Collected 1012576, FPS: 22177.3

Env runner 0: timing waiting: 2.5731, reset: 5.0527, save_policy_outputs: 0.0007, env_step: 28.7689, overhead: 1.1565, format_inputs: 0.3170, one_step: 0.0276, work: 39.3452
[2019-12-06 19:01:42,042] Env runner 1: timing waiting: 2.5900, reset: 4.9147, save_policy_outputs: 0.0004, env_step: 28.8585, overhead: 1.1266, format_inputs: 0.3087, one_step: 0.0254, work: 39.3333
[2019-12-06 19:01:42,227] Gpu worker timing: init: 2.8738, weight_update: 0.0006, deserialize: 7.6602, to_device: 5.3244, forward: 8.1527, serialize: 14.3651, postprocess: 17.5523, policy_step: 38.8745, gpu_waiting: 0.5276
[2019-12-06 19:01:42,232] Gpu learner timing: init: 3.3448, last_values: 0.2737, gae: 3.0682, numpy: 0.5308, finalize: 3.8888, buffer: 5.2451, forw_head: 0.2639, forw_core: 0.8289, forw_tail: 0.5334, clip: 4.5709, update: 12.0888, train: 19.6720, work: 28.8663
[2019-12-06 19:01:42,723] Collected 1007616, FPS: 23975.2

Last version using Plasma:
[2020-01-07 00:24:27,690] Env runner 0: timing wait_actor: 0.0001, waiting: 2.2242, reset: 13.0768, save_policy_outputs: 0.0004, env_step: 27.5735, overhead: 1.0524, format_inputs: 0.2934, enqueue_policy_requests: 4.6075, complete_rollouts: 3.2226, one_step: 0.0250, work: 37.9023
[2020-01-07 00:24:27,697] Env runner 1: timing wait_actor: 0.0042, waiting: 2.2486, reset: 13.3085, save_policy_outputs: 0.0005, env_step: 27.5389, overhead: 1.0206, format_inputs: 0.2921, enqueue_policy_requests: 4.5829, complete_rollouts: 3.3319, one_step: 0.0240, work: 37.8813
[2020-01-07 00:24:27,890] Gpu worker timing: init: 3.0419, wait_policy: 0.0002, gpu_waiting: 0.4060, weight_update: 0.0007, deserialize: 0.0923, to_device: 4.7866, forward: 6.8820, serialize: 13.8782, postprocess: 16.9365, policy_step: 28.8341, one_step: 0.0000, work: 39.9577
[2020-01-07 00:24:27,906] GPU learner timing: buffers: 0.0461, tensors: 8.7751, prepare: 8.8510
[2020-01-07 00:24:27,907] Train loop timing: init: 3.0417, train_wait: 0.0969, bptt: 2.6350, vtrace: 5.7421, losses: 0.7799, clip: 4.6204, update: 9.1475, train: 21.3880
[2020-01-07 00:24:28,213] Collected {0: 1015808}, FPS: 25279.4
[2020-01-07 00:24:28,214] Timing: experience: 40.1832

Version using Pytorch tensors with shared memory:
[2020-01-07 01:08:05,569] Env runner 0: timing wait_actor: 0.0003, waiting: 0.6292, reset: 12.4041, save_policy_outputs: 0.4311, env_step: 30.1347, overhead: 4.3134, enqueue_policy_requests: 0.0677, complete_rollouts: 0.0274, one_step: 0.0261, work: 35.3962, wait_buffers: 0.0164
[2020-01-07 01:08:05,596] Env runner 1: timing wait_actor: 0.0003, waiting: 0.7102, reset: 12.7194, save_policy_outputs: 0.4400, env_step: 30.1091, overhead: 4.2822, enqueue_policy_requests: 0.0630, complete_rollouts: 0.0234, one_step: 0.0270, work: 35.3405, wait_buffers: 0.0162
[2020-01-07 01:08:05,762] Gpu worker timing: init: 2.8383, wait_policy: 0.0000, gpu_waiting: 2.3759, loop: 4.3098, weight_update: 0.0006, updates: 0.0008, deserialize: 0.8207, to_device: 6.8636, forward: 15.0019, postprocess: 2.4855, handle_policy_step: 29.5612, one_step: 0.0000, work: 33.9772
[2020-01-07 01:08:05,896] Train loop timing: init: 2.9927, train_wait: 0.0001, bptt: 2.6755, vtrace: 6.3307, losses: 0.7319, update: 4.6164, train: 22.0022
[2020-01-07 01:08:10,888] Collected {0: 1015808}, FPS: 28900.6
[2020-01-07 01:08:10,888] Timing: experience: 35.1483

Version V53, Torch 1.3.1
[2020-01-09 20:33:23,540] Env runner 0: timing wait_actor: 0.0002, waiting: 0.7097, reset: 5.2281, save_policy_outputs: 0.3789, env_step: 29.3372, overhead: 4.2642, enqueue_policy_requests: 0.0660, complete_rollouts: 0.0313, one_step: 0.0244, work: 34.5037, wait_buffers: 0.0213
[2020-01-09 20:33:23,556] Env runner 1: timing wait_actor: 0.0009, waiting: 0.6965, reset: 5.3100, save_policy_outputs: 0.3989, env_step: 29.3533, overhead: 4.2378, enqueue_policy_requests: 0.0685, complete_rollouts: 0.0290, one_step: 0.0261, work: 34.5326, wait_buffers: 0.0165
[2020-01-09 20:33:23,711] Gpu worker timing: init: 1.3378, wait_policy: 0.0016, gpu_waiting: 2.3035, loop: 4.5320, weight_update: 0.0006, updates: 0.0008, deserialize: 0.8223, to_device: 6.4952, forward: 14.8064, postprocess: 2.4568, handle_policy_step: 28.7065, one_step: 0.0000, work: 33.3578
[2020-01-09 20:33:23,816] GPU learner timing: extract: 0.0137, buffers: 0.0437, tensors: 6.6962, buff_ready: 0.1400, prepare: 6.9068
[2020-01-09 20:33:23,892] Train loop timing: init: 1.3945, train_wait: 0.0000, bptt: 2.2262, vtrace: 5.5308, losses: 0.6580, update: 3.6261, train: 19.8292
[2020-01-09 20:33:28,787] Collected {0: 1015808}, FPS: 29476.0
[2020-01-09 20:33:28,787] Timing: experience: 34.4622

Version V60
[2020-01-19 03:25:14,014] Env runner 0: timing wait_actor: 0.0001, waiting: 9.7151, reset: 41.1152, save_policy_outputs: 0.5734, env_step: 39.1791, overhead: 6.5181, enqueue_policy_requests: 0.1089, complete_rollouts: 0.2901, one_step: 0.0163, work: 47.2741, wait_buffers: 0.2795
[2020-01-19 03:25:14,015] Env runner 1: timing wait_actor: 0.0001, waiting: 10.1184, reset: 41.6788, save_policy_outputs: 0.5846, env_step: 39.1234, overhead: 6.4405, enqueue_policy_requests: 0.1021, complete_rollouts: 0.0304, one_step: 0.0154, work: 46.8807, wait_buffers: 0.0202
[2020-01-19 03:25:14,201] Gpu worker timing: init: 1.3160, wait_policy: 0.0009, gpu_waiting: 9.5548, loop: 9.7118, weight_update: 0.0003, updates: 0.0005, deserialize: 1.5404, to_device: 12.7886, forward: 12.9712, postprocess: 4.9893, handle_policy_step: 37.9686, one_step: 0.0000, work: 47.9418
[2020-01-19 03:25:14,221] GPU learner timing: extract: 0.0392, buffers: 0.0745, tensors: 11.0697, buff_ready: 0.4808, prepare: 11.7095
[2020-01-19 03:25:14,321] Train loop timing: init: 1.4332, train_wait: 0.0451, tensors_gpu_float: 4.3031, bptt: 5.0880, vtrace: 2.4773, losses: 1.9113, update: 7.6270, train: 36.8291
[2020-01-19 03:25:14,465] Collected {0: 2015232}, FPS: 35779.2
[2020-01-19 03:25:14,465] Timing: experience: 56.3241

Version V61, cudnn benchmark=True
[2020-01-19 18:19:31,416] Env runner 0: timing wait_actor: 0.0002, waiting: 8.8857, reset: 41.9806, save_policy_outputs: 0.5918, env_step: 38.3737, overhead: 6.3290, enqueue_policy_requests: 0.1026, complete_rollouts: 0.0286, one_step: 0.0141, work: 46.0301, wait_buffers: 0.0181
[2020-01-19 18:19:31,420] Env runner 1: timing wait_actor: 0.0002, waiting: 9.0225, reset: 42.5019, save_policy_outputs: 0.5540, env_step: 38.1044, overhead: 6.2374, enqueue_policy_requests: 0.1140, complete_rollouts: 0.2770, one_step: 0.0169, work: 45.8830, wait_buffers: 0.2664
[2020-01-19 18:19:31,610] Gpu worker timing: init: 1.3633, wait_policy: 0.0037, gpu_waiting: 9.4391, loop: 9.6261, weight_update: 0.0005, updates: 0.0007, deserialize: 1.4722, to_device: 12.5683, forward: 12.8369, postprocess: 4.9932, handle_policy_step: 36.1579, one_step: 0.0000, work: 45.9985
[2020-01-19 18:19:31,624] GPU learner timing: extract: 0.0376, buffers: 0.0769, tensors: 11.2689, buff_ready: 0.4423, prepare: 11.8845
[2020-01-19 18:19:31,630] Train loop timing: init: 1.4804, train_wait: 0.0481, tensors_gpu_float: 4.1565, bptt: 5.2692, vtrace: 2.2177, losses: 1.7225, update: 7.5387, train: 31.5856
[2020-01-19 18:19:31,797] Collected {0: 1966080}, FPS: 36238.5
[2020-01-19 18:19:31,797] Timing: experience: 54.2540

Version V64
--env=doom_battle_hybrid --train_for_seconds=360000 --algo=APPO --env_frameskip=4 --use_rnn=True --reward_scale=0.5 --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v64_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2 --init_workers_parallel=7 --max_grad_norm=0.0
[2020-01-25 22:44:52,845] Env runner 1: timing wait_actor: 0.0068, waiting: 9.9934, reset: 16.0501, save_policy_outputs: 0.5885, env_step: 38.8288, overhead: 6.5314, enqueue_policy_requests: 0.1232, complete_rollouts: 0.0299, one_step: 0.0167, work: 46.7084, wait_buffers: 0.0195
[2020-01-25 22:44:52,846] Env runner 0: timing wait_actor: 0.0002, waiting: 9.6433, reset: 14.8835, save_policy_outputs: 0.5988, env_step: 39.0076, overhead: 6.6748, enqueue_policy_requests: 0.1294, complete_rollouts: 0.0318, one_step: 0.0167, work: 47.0693, wait_buffers: 0.0211
[2020-01-25 22:44:53,037] Gpu worker timing: init: 1.3123, wait_policy: 0.0024, gpu_waiting: 9.7236, loop: 10.5022, weight_update: 0.0005, updates: 0.0007, deserialize: 1.5961, to_device: 12.5846, forward: 13.2160, postprocess: 5.1388, handle_policy_step: 36.8111, one_step: 0.0000, work: 47.4999
[2020-01-25 22:44:53,048] GPU learner timing: extract: 0.0329, buffers: 0.0760, tensors: 11.2344, buff_ready: 0.4467, prepare: 11.8263
[2020-01-25 22:44:53,060] Train loop timing: init: 1.4357, train_wait: 0.1186, tensors_gpu_float: 4.1724, bptt: 5.2798, vtrace: 2.4177, losses: 1.8281, update: 7.7311, train: 32.4878
[2020-01-25 22:44:53,219] Collected {0: 2015232}, FPS: 35969.4
[2020-01-25 22:44:53,219] Timing: experience: 56.0263

Version V66
--env=doom_benchmark --train_for_seconds=360000 --algo=APPO --env_frameskip=4 --use_rnn=True  --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v66_test3 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2 --init_workers_parallel=7
[2020-02-05 02:21:08,568][06063] Env runner 0, rollouts 780: timing wait_actor: 0.0002, waiting: 7.0481, reset: 9.3021, save_policy_outputs: 0.5583, env_step: 34.4028, overhead: 6.0476, complete_rollouts: 0.3592, enqueue_policy_requests: 0.1203, one_step: 0.0192, work: 42.1171, wait_buffers: 0.3469
[2020-02-05 02:21:08,596][04810] Env runner 1, rollouts 770: timing wait_actor: 0.0001, waiting: 7.4001, reset: 23.0733, save_policy_outputs: 0.5752, env_step: 34.1180, overhead: 5.8824, complete_rollouts: 0.4621, enqueue_policy_requests: 0.1337, one_step: 0.0091, work: 41.799, wait_buffers: 0.4502
[2020-02-05 02:21:08,764][04801] Policy worker timing: init: 1.4682, wait_policy: 0.0029, gpu_waiting: 101.1209, weight_update: 0.0003, updates: 0.0004, loop: 9.0389, handle_policy_step: 29.3456, one_step: 0.0000, work: 38.6256, deserialize: 1.2242, to_device: 10.7996, forward: 6.7556, postprocess: 7.2947
[2020-02-05 02:21:08,780][04774] GPU learner timing: extract: 0.0555, buffers: 0.0706, tensors: 12.0109, buff_ready: 0.5418, prepare: 12.6917
[2020-02-05 02:21:09,082][04715] Collected {0: 1974272}, FPS: 40185.7
[2020-02-05 02:21:09,082][04715] Timing: experience: 49.1287

Version V66
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True  --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v66_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-02-11 20:59:56,337][28791] Env runner 0, rollouts 800: timing wait_actor: 0.0002, waiting: 6.1882, reset: 15.3720, save_policy_outputs: 0.5901, env_step: 34.3711, overhead: 6.1186, complete_rollouts: 0.6102, enqueue_policy_requests: 0.1157, one_step: 0.0151, work: 42.4098, wait_buffers: 0.5976
[2020-02-11 20:59:56,293][28793] Env runner 1, rollouts 790: timing wait_actor: 0.0022, waiting: 6.3542, reset: 14.9161, save_policy_outputs: 0.5772, env_step: 34.3059, overhead: 6.0034, complete_rollouts: 0.6385, enqueue_policy_requests: 0.1143, one_step: 0.0155, work: 42.2383, wait_buffers: 0.6263
[2020-02-11 20:59:56,322][28790] Policy worker timing: init: 1.9307, wait_policy: 0.0000, gpu_waiting: 25.3407, weight_update: 0.0004, updates: 0.0007, loop: 8.7840, handle_policy_step: 29.2553, one_step: 0.0023, work: 38.2444, deserialize: 1.1606, to_device: 10.8993, forward: 6.6921, postprocess: 7.1333
[2020-02-11 20:59:56,358][28767] GPU learner timing: extract: 0.0488, buffers: 0.0732, tensors: 12.1752, buff_ready: 0.4165, prepare: 12.7439
[2020-02-11 20:59:56,391][28767] Train loop timing: init: 1.3657, train_wait: 0.0968, tensors_gpu_float: 4.1287, bptt: 5.5166, vtrace: 2.6644, losses: 0.6868, clip: 6.1476, update: 12.7627, train: 31.3387
[2020-02-11 20:59:56,606][28705] Collected {0: 2015232}, FPS: 41630.1
[2020-02-11 20:59:56,606][28705] Timing: experience: 48.4080

Version V69 (numpy arrays with dtype=object to access shared memory, on all components)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True  --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v69_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-03-13 02:01:33,655][11226] Env runner 0, rollouts 800: timing wait_actor: 0.0001, waiting: 6.8923, reset: 9.7055, save_policy_outputs: 1.1013, env_step: 34.8643, overhead: 2.6461, complete_rollouts: 0.0163, enqueue_policy_requests: 0.1268, one_step: 0.0149, work: 40.6362, wait_buffers: 0.0416
[2020-03-13 02:01:33,717][11228] Env runner 1, rollouts 780: timing wait_actor: 0.0056, waiting: 7.2380, reset: 12.4794, save_policy_outputs: 1.1331, env_step: 34.3026, overhead: 2.6869, complete_rollouts: 0.0106, enqueue_policy_requests: 0.1320, one_step: 0.0149, work: 40.2842, wait_buffers: 0.2142
[2020-03-13 02:01:33,701][11225] Policy worker timing: init: 1.7734, wait_policy: 0.0050, gpu_waiting: 23.6931, weight_update: 0.0005, updates: 0.0007, loop: 9.1436, handle_policy_step: 28.1835, one_step: 0.0000, work: 37.5288, deserialize: 1.2307, to_device: 11.0009, forward: 6.2592, postprocess: 6.3423
[2020-03-13 02:01:33,711][11207] GPU learner timing: extract: 0.2552, buffers: 0.0665, tensors: 12.2804, buff_ready: 0.4470, prepare: 12.8576
[2020-03-13 02:01:33,739][11207] Train loop timing: init: 1.3547, train_wait: 0.0001, tensors_gpu_float: 4.7140, bptt: 6.1209, vtrace: 2.3818, losses: 0.7224, clip: 6.4046, update: 13.7435, train: 33.4576
[2020-03-13 02:01:33,937][11111] Collected {0: 2007040}, FPS: 42423.2
[2020-03-13 02:01:33,937][11111] Timing: experience: 47.3100

Version V70 (fast C++ queues)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True  --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v70_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-03-13 05:41:33,867][27104] Env runner 0, rollouts 810: timing wait_actor: 0.0000, waiting: 4.8948, reset: 12.3130, save_policy_outputs: 1.1351, env_step: 34.4284, overhead: 2.6341, complete_rollouts: 0.0115, enqueue_policy_requests: 0.1613, one_step: 0.0143, work: 40.2222
[2020-03-13 05:41:33,869][27106] Env runner 1, rollouts 790: timing wait_actor: 0.0000, waiting: 5.6380, reset: 10.6275, save_policy_outputs: 1.0901, env_step: 33.9549, overhead: 2.5067, complete_rollouts: 0.0110, enqueue_policy_requests: 0.1773, one_step: 0.0327, work: 39.5129
[2020-03-13 05:41:33,829][27103] Policy worker timing: init: 1.7287, wait_policy_total: 16.8390, wait_policy: 0.0022, handle_policy_step: 41.4499, one_step: 0.0043, weight_update: 0.0004, updates: 0.0007, deserialize: 1.4689, to_device: 13.1793, forward: 11.4226, postprocess: 10.6741
[2020-03-13 05:41:33,839][27085] GPU learner timing: extract: 0.2827, buffers: 0.0705, tensors: 11.6516, buff_ready: 0.5000, prepare: 12.3128
[2020-03-13 05:41:33,853][27085] Train loop timing: init: 1.3051, train_wait: 0.0000, tensors_gpu_float: 4.5831, bptt: 6.0317, vtrace: 2.4454, losses: 0.7526, clip: 6.2237, update: 13.2196, train: 32.9858
[2020-03-13 05:41:34,053][26983] Collected {0: 2015232}, FPS: 44822.8
[2020-03-13 05:41:34,053][26983] Timing: experience: 44.9600

Version V73 (process priority + to(device) in background thread on the learner)
Policy #0 lag: (min: 1.0, avg: 4.4, max: 9.0)
[2020-03-14 00:43:41,047][11371] Env runner 0, rollouts 800: timing wait_actor: 0.0005, waiting: 1.5608, reset: 15.0341, save_policy_outputs: 1.0898, env_step: 35.9769, overhead: 2.7699, complete_rollouts: 0.0159, enqueue_policy_requests: 0.1750, one_step: 0.0156, work: 41.9856
[2020-03-14 00:43:41,071][11372] Env runner 1, rollouts 780: timing wait_actor: 0.0000, waiting: 1.6715, reset: 15.3323, save_policy_outputs: 1.1191, env_step: 35.8312, overhead: 2.7640, complete_rollouts: 0.0153, enqueue_policy_requests: 0.1592, one_step: 0.0149, work: 41.9102, wait_buffers: 0.1198
[2020-03-14 00:43:41,310][11370] Policy worker avg. requests 4.36, timing: init: 1.7793, wait_policy_total: 16.5541, wait_policy: 0.0001, handle_policy_step: 41.6576, one_step: 0.0042, deserialize: 1.7135, obs_to_device: 5.2590, stack: 14.9778, forward: 13.9429, postprocess: 5.5726, weight_update: 0.0005
[2020-03-14 00:43:41,481][11353] Train loop timing: init: 1.3224, train_wait: 0.0000, bptt: 11.2281, vtrace: 1.6551, losses: 0.8895, clip: 6.6278, update: 14.9070, train: 42.3450
[2020-03-14 00:43:41,782][11353] GPU learner timing: extract: 0.1827, buffers: 0.0695, tensors: 8.7560, buff_ready: 0.3020, tensors_gpu_float: 5.9617, prepare: 15.1426
[2020-03-14 00:43:41,916][11262] Collected {0: 2015232}, FPS: 46341.4
[2020-03-14 00:43:41,916][11262] Timing: experience: 43.4867

Version V76 (Pytorch 1.4, faster indexing, faster batching. Improvements mostly in the learner)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True  --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v76_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-03-14 05:20:45,029][09278] Env runner 0, rollouts 810: timing wait_actor: 0.0000, waiting: 1.4923, reset: 11.3817, save_policy_outputs: 1.0137, env_step: 36.3330, overhead: 3.8381, complete_rollouts: 0.0150, enqueue_policy_requests: 0.1605, one_step: 0.0148, work: 43.4425
[2020-03-14 05:20:45,041][09279] Env runner 1, rollouts 780: timing wait_actor: 0.0070, waiting: 1.7666, reset: 13.1651, save_policy_outputs: 1.0001, env_step: 36.2509, overhead: 3.8008, complete_rollouts: 0.0194, enqueue_policy_requests: 0.1582, one_step: 0.0160, work: 43.1787
[2020-03-14 05:20:45,276][09277] Policy worker avg. requests 2.54, timing: init: 1.7812, wait_policy_total: 17.5261, wait_policy: 0.0022, handle_policy_step: 42.9551, one_step: 0.0022, deserialize: 1.7076, obs_to_device: 5.0701, stack: 14.8308, forward: 14.4700, postprocess: 5.5982, weight_update: 0.0004
[2020-03-14 05:20:45,383][09251] GPU learner timing: extract: 0.1829, buffers: 0.0667, batching: 5.2437, buff_ready: 0.2342, tensors_gpu_float: 7.8419, squeeze: 0.0156, prepare: 13.4659, batcher_mem: 5.1972
[2020-03-14 05:20:45,689][09251] Train loop timing: init: 1.3501, train_wait: 0.2603, forward_head: 10.0923, head_out_index: 0.0729, forward_core: 4.5256, bptt: 12.1633, tail: 0.7243, vtrace: 1.7064, losses: 0.4411, clip: 8.5073, update: 14.8810, train: 42.8166
[2020-03-14 05:20:45,814][09212] Collected {0: 2015232}, FPS: 44886.8
[2020-03-14 05:20:45,815][09212] Timing: experience: 44.8958

Version V78 (bptt improvements)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True  --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v78_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-03-14 21:07:21,403][11198] Env runner 0, rollouts 780: timing wait_actor: 0.0000, waiting: 1.5420, reset: 13.9346, save_policy_outputs: 0.9993, env_step: 35.5315, overhead: 3.6310, complete_rollouts: 0.0140, enqueue_policy_requests: 0.1555, one_step: 0.0157, work: 42.2544
[2020-03-14 21:07:21,406][11199] Env runner 1, rollouts 780: timing wait_actor: 0.0000, waiting: 1.6902, reset: 16.2373, save_policy_outputs: 0.9954, env_step: 35.4816, overhead: 3.5863, complete_rollouts: 0.0145, enqueue_policy_requests: 0.1549, one_step: 0.0148, work: 42.1175
[2020-03-14 21:07:21,653][11197] Policy worker avg. requests 3.72, timing: init: 1.7326, wait_policy_total: 16.2322, wait_policy: 0.0051, handle_policy_step: 41.8431, one_step: 0.0000, deserialize: 1.6981, obs_to_device: 5.2822, stack: 14.7678, forward: 14.0202, postprocess: 5.5750, weight_update: 0.0004
[2020-03-14 21:07:21,760][11180] GPU learner timing: extract: 0.1839, buffers: 0.0659, batching: 5.1729, buff_ready: 0.2354, tensors_gpu_float: 7.1679, squeeze: 0.0112, prepare: 12.7145, batcher_mem: 5.1284
[2020-03-14 21:07:22,066][11180] Train loop timing: init: 1.3305, train_wait: 0.2645, forward_head: 9.1401, bptt_initial: 0.7022, forward_core: 5.8878, bptt_rnn_states: 3.6712, bptt: 9.7196, tail: 0.5777, vtrace: 2.0181, losses: 0.4139, clip: 9.0324, update: 15.5247, train: 40.9327
[2020-03-14 21:07:22,205][11140] Collected {0: 2015232}, FPS: 46035.7
[2020-03-14 21:07:22,205][11140] Timing: experience: 43.7754

Version V81 (after DMLab-related refactoring and adding Dummy Sampler)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True  --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v81_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-03 02:39:43,305][20114] Env runner 0, rollouts 820: timing wait_actor: 0.0000, waiting: 0.3761, reset: 9.6688, save_policy_outputs: 0.9744, env_step: 35.8396, overhead: 3.8425, complete_rollouts: 0.0153, enqueue_policy_requests: 0.1639, one_step: 0.0147, work: 42.8321
[2020-04-03 02:39:43,316][20115] Env runner 1, rollouts 800: timing wait_actor: 0.0000, waiting: 0.4349, reset: 12.9288, save_policy_outputs: 0.9748, env_step: 36.0682, overhead: 3.6751, complete_rollouts: 0.0156, enqueue_policy_requests: 0.1574, one_step: 0.0167, work: 42.7851
[2020-04-03 02:39:43,557][20113] Policy worker avg. requests 2.18, timing: init: 1.7894, wait_policy_total: 14.0706, wait_policy: 0.0001, handle_policy_step: 41.2697, one_step: 0.0025, deserialize: 1.4548, obs_to_device: 4.5620, stack: 13.0836, forward: 15.5851, postprocess: 4.8575, weight_update: 0.0005
[2020-04-03 02:39:43,664][20096] GPU learner timing: extract: 0.1808, buffers: 0.0643, batching: 5.1004, buff_ready: 0.2323, tensors_gpu_float: 7.9368, squeeze: 0.0091, prepare: 13.4034, batcher_mem: 5.0559
[2020-04-03 02:39:43,971][20096] Train loop timing: init: 1.3558, train_wait: 0.3017, forward_head: 9.8845, bptt_initial: 0.7820, bptt_forward_core: 7.4162, bptt_rnn_states: 4.5037, bptt: 12.0998, tail: 0.5640, vtrace: 0.9889, losses: 0.3733, clip: 8.1355, update: 14.2952, train: 41.5756
[2020-04-03 02:39:44,107][20058] Collected {0: 2015232}, FPS: 46778.4
[2020-04-03 02:39:44,107][20058] Timing: experience: 43.0804

Version V83 (Mostly refactoring)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v83_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-04 02:03:59,045][02465] Env runner 1, rollouts 800: timing wait_actor: 0.0000, waiting: 0.4651, reset: 10.2102, save_policy_outputs: 0.9537, env_step: 36.2177, overhead: 3.7336, complete_rollouts: 0.0158, enqueue_policy_requests: 0.1606, one_step: 0.0152, work: 43.0656
[2020-04-04 02:03:59,060][02464] Env runner 0, rollouts 800: timing wait_actor: 0.0000, waiting: 0.5348, reset: 14.2725, save_policy_outputs: 0.9687, env_step: 36.1914, overhead: 3.6750, complete_rollouts: 0.0151, enqueue_policy_requests: 0.1582, one_step: 0.0157, work: 42.9876
[2020-04-04 02:03:59,275][02463] Policy worker avg. requests 2.98, timing: init: 1.8110, wait_policy_total: 14.7359, wait_policy: 0.0051, handle_policy_step: 41.7147, one_step: 0.0000, deserialize: 1.3855, obs_to_device: 5.1736, stack: 13.7928, forward_encoder: 7.4016, forward: 15.6072, postprocess: 4.7609, weight_update: 0.0005
[2020-04-04 02:03:59,385][02449] GPU learner timing: extract: 0.1784, buffers: 0.0654, batching: 5.0967, buff_ready: 0.2283, tensors_gpu_float: 8.2659, squeeze: 0.0121, prepare: 13.7269, batcher_mem: 5.0500
[2020-04-04 02:03:59,691][02449] Train loop timing: init: 1.3370, train_wait: 0.2528, forward_encoder: 11.0017, forward_head: 11.0051, bptt_initial: 1.0097, bptt_forward_core: 6.6520, bptt_rnn_states: 4.2696, bptt: 11.0963, tail: 0.6196, vtrace: 1.0200, losses: 0.4056, clip: 8.2484, update: 14.1095, train: 42.0111
[2020-04-04 02:03:59,829][02416] Collected {0: 2015232}, FPS: 46444.4
[2020-04-04 02:03:59,829][02416] Timing: experience: 43.3902

Version V84 (replaced report queue with C++ fast queue)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --macro_batch=2048 --batch_size=2048 --experiment=doom_battle_appo_v84_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-07 19:08:20,955][03903] Env runner 0, rollouts 800: timing wait_actor: 0.0000, waiting: 0.4329, reset: 12.9667, save_policy_outputs: 0.9610, env_step: 36.3117, overhead: 3.7032, complete_rollouts: 0.0155, enqueue_policy_requests: 0.1662, one_step: 0.0152, work: 43.1195
[2020-04-07 19:08:20,980][03904] Env runner 1, rollouts 800: timing wait_actor: 0.0000, waiting: 0.4846, reset: 10.7025, save_policy_outputs: 0.9715, env_step: 36.2209, overhead: 3.6897, complete_rollouts: 0.0149, enqueue_policy_requests: 0.1649, one_step: 0.0158, work: 43.0722
[2020-04-07 19:08:21,195][03902] Policy worker avg. requests 3.14, timing: init: 1.8839, wait_policy_total: 14.6510, wait_policy: 0.0051, handle_policy_step: 41.6352, one_step: 0.0000, deserialize: 1.3962, obs_to_device: 5.0880, stack: 13.6252, forward: 15.4521, postprocess: 5.0110, weight_update: 0.0004
[2020-04-07 19:08:21,300][03885] GPU learner timing: extract: 0.1786, buffers: 0.0659, batching: 5.1178, buff_ready: 0.2293, tensors_gpu_float: 7.9821, squeeze: 0.0124, prepare: 13.4654, batcher_mem: 5.0712
[2020-04-07 19:08:21,607][03885] Train loop timing: init: 1.3035, train_wait: 0.2528, forward_head: 9.9191, bptt_initial: 1.0980, bptt_forward_core: 7.1492, bptt_rnn_states: 4.6381, bptt: 11.9665, tail: 0.5081, vtrace: 1.2516, losses: 0.4508, clip: 7.8777, update: 14.1513, train: 42.0710
[2020-04-07 19:08:21,748][03845] Collected {0: 2015232}, FPS: 46475.5
[2020-04-07 19:08:21,748][03845] Timing: experience: 43.3612

Version V85 (remove macro_batch parameter)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v85_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-09 00:17:45,919][08128] Env runner 0, rollouts 820: timing wait_actor: 0.0000, waiting: 0.4283, reset: 12.1576, save_policy_outputs: 0.9939, env_step: 36.2774, overhead: 3.7866, complete_rollouts: 0.0152, enqueue_policy_requests: 0.1750, one_step: 0.0148, work: 43.2697
[2020-04-09 00:17:45,929][08129] Env runner 1, rollouts 780: timing wait_actor: 0.0000, waiting: 0.4069, reset: 14.8796, save_policy_outputs: 0.9644, env_step: 36.5389, overhead: 3.6801, complete_rollouts: 0.0143, enqueue_policy_requests: 0.1543, one_step: 0.0150, work: 43.3091
[2020-04-09 00:17:46,168][08127] Policy worker avg. requests 2.98, timing: init: 1.8611, wait_policy_total: 17.1872, wait_policy: 0.0051, handle_policy_step: 41.7487, one_step: 0.0000, deserialize: 1.4023, obs_to_device: 5.1597, stack: 13.6694, forward: 15.5434, postprocess: 4.9102, weight_update: 0.0005
[2020-04-09 00:17:46,276][08108] GPU learner timing: extract: 0.1817, buffers: 0.0662, batching: 5.1368, buff_ready: 0.2489, tensors_gpu_float: 7.7053, squeeze: 0.0106, prepare: 13.2235, batcher_mem: 5.0908
[2020-04-09 00:17:46,582][08108] Train loop timing: init: 1.3731, train_wait: 0.4299, forward_head: 9.1451, bptt_initial: 1.1248, bptt_forward_core: 7.7663, bptt_rnn_states: 4.9050, bptt: 12.8538, tail: 0.8713, vtrace: 1.1398, losses: 0.3700, clip: 8.0898, update: 14.1737, train: 42.0695
[2020-04-09 00:17:46,725][08069] Collected {0: 2015232}, FPS: 46042.6
[2020-04-09 00:17:46,725][08069] Timing: experience: 43.5909

Version V86 (new DMLab reward calculation, PBT changes)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v86_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-10 00:37:59,189][10337] Env runner 0, rollouts 800: timing wait_actor: 0.0000, waiting: 0.4441, reset: 17.0015, save_policy_outputs: 0.9655, env_step: 36.0246, overhead: 3.7659, complete_rollouts: 0.0157, enqueue_policy_requests: 0.1704, one_step: 0.0155, work: 42.9231
[2020-04-10 00:37:59,219][10339] Env runner 1, rollouts 800: timing wait_actor: 0.0000, waiting: 0.4638, reset: 17.8751, save_policy_outputs: 0.9556, env_step: 36.1600, overhead: 3.6650, complete_rollouts: 0.0151, enqueue_policy_requests: 0.1646, one_step: 0.0130, work: 42.9417
[2020-04-10 00:37:59,444][10336] Policy worker avg. requests 2.18, timing: init: 1.9154, wait_policy_total: 17.0760, wait_policy: 0.0026, handle_policy_step: 41.4865, one_step: 0.0019, deserialize: 1.3896, obs_to_device: 5.1518, stack: 13.6961, forward: 15.5052, postprocess: 4.8502, weight_update: 0.0007
[2020-04-10 00:37:59,549][10319] GPU learner timing: extract: 0.1794, buffers: 0.0666, batching: 5.0779, buff_ready: 0.2536, tensors_gpu_float: 8.0384, squeeze: 0.0108, prepare: 13.5054, batcher_mem: 5.0249
[2020-04-10 00:37:59,855][10319] Train loop timing: init: 1.3496, train_wait: 0.2624, forward_head: 10.3642, bptt_initial: 1.0378, bptt_forward_core: 7.8731, bptt_rnn_states: 5.0327, bptt: 13.0921, tail: 0.5914, vtrace: 0.9438, losses: 0.3258, clip: 7.5854, update: 13.6748, train: 42.2079
[2020-04-10 00:38:00,016][10286] Collected {0: 2015232}, FPS: 46529.9
[2020-04-10 00:38:00,016][10286] Timing: experience: 43.1344

Version V87 (non-shared actor critic option, lots of changes for quadrotors)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v87_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-13 00:48:11,323][04485] Env runner 0, rollouts 800: timing wait_actor: 0.0000, waiting: 0.5039, reset: 13.6255, save_policy_outputs: 1.0308, env_step: 36.6517, overhead: 3.7472, complete_rollouts: 0.0197, enqueue_policy_requests: 0.1617, one_step: 0.0152, work: 43.5982
[2020-04-13 00:48:11,323][04487] Env runner 1, rollouts 780: timing wait_actor: 0.0000, waiting: 0.6506, reset: 15.7767, save_policy_outputs: 0.9567, env_step: 36.6865, overhead: 3.6654, complete_rollouts: 0.0148, enqueue_policy_requests: 0.1648, one_step: 0.0150, work: 43.4545
[2020-04-13 00:48:11,578][04484] Policy worker avg. requests 2.96, timing: init: 1.8653, wait_policy_total: 17.3022, wait_policy: 0.0051, handle_policy_step: 42.1872, one_step: 0.0000, deserialize: 1.3958, obs_to_device: 5.1257, stack: 13.9414, forward: 15.4949, postprocess: 4.8711, weight_update: 0.0040
[2020-04-13 00:48:11,685][04464] GPU learner timing: extract: 0.1818, buffers: 0.0666, batching: 5.2302, buff_ready: 0.2432, tensors_gpu_float: 7.9712, squeeze: 0.0082, prepare: 13.5848, batcher_mem: 5.1797
[2020-04-13 00:48:11,991][04464] Train loop timing: init: 1.3819, train_wait: 0.3030, forward_head: 9.0646, bptt_initial: 0.9873, bptt_forward_core: 7.6174, bptt_rnn_states: 4.7843, bptt: 12.5837, tail: 0.6512, vtrace: 1.2610, losses: 0.3819, clip: 9.0263, update: 14.9522, train: 42.3485
[2020-04-13 00:48:12,147][04426] Collected {0: 2015232}, FPS: 45759.9
[2020-04-13 00:48:12,147][04426] Timing: experience: 43.8603

Version V89 (added pinned memory)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v89_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-16 23:35:46,312][24861] Env runner 0, rollouts 800: timing wait_actor: 0.0000, waiting: 0.9294, reset: 15.1188, save_policy_outputs: 0.9652, env_step: 36.1615, overhead: 3.7119, complete_rollouts: 0.0154, enqueue_policy_requests: 0.1591, one_step: 0.0150, work: 42.9982
[2020-04-16 23:35:46,315][24862] Env runner 1, rollouts 780: timing wait_actor: 0.0000, waiting: 0.7218, reset: 14.1896, save_policy_outputs: 1.0339, env_step: 36.0976, overhead: 3.8220, complete_rollouts: 0.0149, enqueue_policy_requests: 0.1786, one_step: 0.0163, work: 43.2035
[2020-04-16 23:35:46,559][24860] Policy worker avg. requests 4.24, timing: init: 1.8025, wait_policy_total: 15.8683, wait_policy: 0.0002, handle_policy_step: 42.0863, one_step: 0.0015, deserialize: 1.4354, obs_to_device: 5.3716, stack: 14.5457, forward: 14.7231, postprocess: 4.8468, weight_update: 0.0004
[2020-04-16 23:35:46,668][24846] GPU learner timing: extract: 0.1800, buffers: 0.0652, batching: 5.1039, buff_ready: 0.2408, tensors_gpu_float: 6.9406, squeeze: 0.0070, prepare: 12.4138, batcher_mem: 5.0034
[2020-04-16 23:35:46,975][24846] Train loop timing: init: 1.3649, train_wait: 0.2559, forward_head: 9.9435, bptt_initial: 1.1235, bptt_forward_core: 7.0334, bptt_rnn_states: 4.3835, bptt: 11.5951, tail: 0.6953, vtrace: 1.6279, losses: 0.3889, clip: 10.1573, update: 14.3870, train: 42.5496
[2020-04-16 23:35:47,147][24811] Workers joined!
[2020-04-16 23:35:47,157][24811] Collected {0: 2015232}, FPS: 46004.9
[2020-04-16 23:35:47,157][24811] Timing: experience: 43.6267

Version V90 (added ability to train on CPU as well)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v90_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-23 19:12:12,927][30120] Env runner 0, rollouts 780: timing wait_actor: 0.0000, waiting: 0.6942, reset: 14.9306, save_policy_outputs: 0.9456, env_step: 35.5135, overhead: 3.5594, complete_rollouts: 0.0152, enqueue_policy_requests: 0.1761, one_step: 0.0168, work: 42.1509
[2020-04-23 19:12:12,934][30124] Env runner 1, rollouts 820: timing wait_actor: 0.0000, waiting: 0.6884, reset: 14.4668, save_policy_outputs: 1.0144, env_step: 35.3751, overhead: 3.5973, complete_rollouts: 0.0161, enqueue_policy_requests: 0.1624, one_step: 0.0148, work: 42.1716
[2020-04-23 19:12:13,174][30119] Policy worker avg. requests 2.94, timing: init: 1.9893, wait_policy_total: 15.1174, wait_policy: 0.0008, handle_policy_step: 41.1058, one_step: 0.0037, deserialize: 1.4255, obs_to_device: 5.4478, stack: 13.9475, forward: 15.0436, postprocess: 4.8951, weight_update: 0.0005
[2020-04-23 19:12:13,283][30106] GPU learner timing: extract: 0.1877, buffers: 0.0664, batching: 5.0616, buff_ready: 0.2413, tensors_gpu_float: 5.8271, squeeze: 0.0056, prepare: 11.2599, batcher_mem: 4.9639
[2020-04-23 19:12:13,589][30106] Train loop timing: init: 1.3502, train_wait: 0.2526, forward_head: 9.8003, bptt_initial: 1.1674, bptt_forward_core: 7.8029, bptt_rnn_states: 4.9400, bptt: 12.9313, tail: 0.4384, vtrace: 1.4388, losses: 0.3578, clip: 8.5222, update: 12.6351, train: 41.3112
[2020-04-23 19:12:13,759][30073] Collected {0: 2015232}, FPS: 47151.6
[2020-04-23 19:12:13,760][30073] Timing: experience: 42.5657

Version V92 (new mechanism to control experience collection rate, unlock GIL in C++ queue)
minor slowdown is expected, but mostly in the very beginning
[2020-04-27 03:19:56,923][31305] Env runner 1, rollouts 800: timing wait_actor: 0.0000, waiting: 1.8314, reset: 12.8639, save_policy_outputs: 0.9403, env_step: 35.6297, overhead: 3.5156, complete_rollouts: 0.0145, enqueue_policy_requests: 0.1547, one_step: 0.0147, work: 42.1625
[2020-04-27 03:19:56,950][31303] Env runner 0, rollouts 800: timing wait_actor: 0.0000, waiting: 1.7260, reset: 14.4580, save_policy_outputs: 0.9853, env_step: 35.5968, overhead: 3.6064, complete_rollouts: 0.0157, enqueue_policy_requests: 0.1596, one_step: 0.0153, work: 42.2983
[2020-04-27 03:19:57,173][31302] Policy worker avg. requests 3.12, timing: init: 1.8919, wait_policy_total: 16.3687, wait_policy: 0.0051, handle_policy_step: 41.0353, one_step: 0.0000, deserialize: 1.4363, obs_to_device: 5.3862, stack: 13.9244, forward: 14.8523, postprocess: 4.9664, weight_update: 0.0004
[2020-04-27 03:19:57,280][31288] GPU learner timing: extract: 0.1831, buffers: 0.0648, batching: 4.7433, buff_ready: 0.2306, tensors_gpu_float: 1.7101, squeeze: 0.0050, prepare: 6.8177, batcher_mem: 4.6762
[2020-04-27 03:19:57,613][31288] Train loop timing: init: 1.2959, train_wait: 0.4145, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.4530, bptt_initial: 0.0178, bptt_forward_core: 0.8444, bptt_rnn_states: 0.1981, bptt: 1.1631, tail: 0.2899, vtrace: 0.8977, clip: 6.3809, update: 10.1215, after_optimizer: 0.0815, losses: 10.4773, train: 15.6995
[2020-04-27 03:19:57,754][31256] Collected {0: 2015232}, FPS: 46069.8
[2020-04-27 03:19:57,754][31256] Timing: experience: 43.5652

Version V93 (fixed the mul_ issue in the learner loop, slightly reworked action distributions)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v93_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-04-30 00:40:06,303][21412] Env runner 0, rollouts 800: timing wait_actor: 0.0000, waiting: 1.4730, reset: 15.9357, save_policy_outputs: 0.9807, env_step: 35.6301, overhead: 3.5418, complete_rollouts: 0.0152, enqueue_policy_requests: 0.1777, one_step: 0.0185, work: 42.2857
[2020-04-30 00:40:06,309][21413] Env runner 1, rollouts 780: timing wait_actor: 0.0000, waiting: 1.5991, reset: 16.8050, save_policy_outputs: 0.9356, env_step: 35.6926, overhead: 3.4544, complete_rollouts: 0.0145, enqueue_policy_requests: 0.1600, one_step: 0.0151, work: 42.1568
[2020-04-30 00:40:06,541][21411] Policy worker avg. requests 3.62, timing: init: 1.8576, wait_policy_total: 16.5957, wait_policy: 0.0000, handle_policy_step: 40.9567, one_step: 0.0022, deserialize: 1.4473, obs_to_device: 5.3679, stack: 13.9106, forward: 14.9962, postprocess: 4.8318, weight_update: 0.0005
[2020-04-30 00:40:06,650][21397] GPU learner timing: extract: 0.1846, buffers: 0.0664, batching: 4.7068, buff_ready: 0.2618, tensors_gpu_float: 1.7512, squeeze: 0.0067, prepare: 6.8604, batcher_mem: 4.6318
[2020-04-30 00:40:06,957][21397] Train loop timing: init: 1.3771, train_wait: 0.4144, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.4586, bptt_initial: 0.0177, bptt_forward_core: 0.8403, bptt_rnn_states: 0.2238, bptt: 1.1868, tail: 0.2921, vtrace: 0.8646, clip: 6.3309, update: 9.9849, after_optimizer: 0.0954, losses: 10.3387, train: 15.4936
[2020-04-30 00:40:07,139][21362] Collected {0: 2015232}, FPS: 46308.9
[2020-04-30 00:40:07,139][21362] Timing: experience: 43.3403

Version V95 (threadpoolctl)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v95_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-05-07 00:20:28,984][24986] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 1.5409, reset: 15.0362, save_policy_outputs: 0.9470, env_step: 35.7530, overhead: 3.6169, complete_rollouts: 0.0151, enqueue_policy_requests: 0.1806, one_step: 0.0148, work: 42.4392
[2020-05-07 00:20:28,993][24987] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 1.6076, reset: 12.7877, save_policy_outputs: 0.9770, env_step: 35.7734, overhead: 3.5536, complete_rollouts: 0.0156, enqueue_policy_requests: 0.1630, one_step: 0.0146, work: 42.3769
[2020-05-07 00:20:29,232][24985] Policy worker avg. requests 3.34, timing: init: 1.7801, wait_policy_total: 15.0389, wait_policy: 0.0051, handle_policy_step: 40.9999, one_step: 0.0000, deserialize: 1.4367, obs_to_device: 5.3197, stack: 13.9150, forward: 14.8686, postprocess: 4.8007, weight_update: 0.0005
[2020-05-07 00:20:29,339][24965] GPU learner timing: extract: 0.1923, buffers: 0.0661, batching: 4.7009, buff_ready: 0.2363, tensors_gpu_float: 1.5160, squeeze: 0.0051, prepare: 6.5952, batcher_mem: 4.6232
[2020-05-07 00:20:29,647][24965] Train loop timing: init: 1.3167, train_wait: 0.3439, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.4390, bptt_initial: 0.0178, bptt_forward_core: 0.8403, bptt_rnn_states: 0.2221, bptt: 1.1842, tail: 0.2767, vtrace: 0.8588, losses: 0.2639, clip: 6.2482, update: 9.9498, after_optimizer: 0.1336, train: 15.5308
[2020-05-07 00:20:29,819][24921] Collected {0: 2015232}, FPS: 45853.0
[2020-05-07 00:20:29,819][24921] Timing: experience: 43.7712

Version V96 (min num requests on policy worker)
[2020-05-09 03:14:52,420][16416] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 1.2325, reset: 11.7978, save_policy_outputs: 0.9999, env_step: 35.6990, overhead: 3.7528, complete_rollouts: 0.0160, enqueue_policy_requests: 0.1988, one_step: 0.0151, work: 42.6707
[2020-05-09 03:14:52,436][16417] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 1.2668, reset: 14.4219, save_policy_outputs: 0.9878, env_step: 35.8100, overhead: 3.6445, complete_rollouts: 0.0156, enqueue_policy_requests: 0.2179, one_step: 0.0155, work: 42.6389
[2020-05-09 03:14:52,682][16415] Policy worker avg. requests 6.66, timing: init: 1.7922, wait_policy_total: 13.1051, wait_policy: 0.0051, handle_policy_step: 33.6309, one_step: 0.0018, deserialize: 1.1954, obs_to_device: 4.3513, stack: 11.7198, forward: 11.2005, postprocess: 4.1313, weight_update: 0.0005
[2020-05-09 03:14:52,778][16392] GPU learner timing: extract: 0.1836, buffers: 0.0659, batching: 4.6835, buff_ready: 0.2483, tensors_gpu_float: 1.6530, squeeze: 0.0051, prepare: 6.7244, batcher_mem: 4.5886
[2020-05-09 03:14:53,086][16392] Train loop timing: init: 1.3396, train_wait: 0.3579, epoch_init: 0.0013, minibatch_init: 0.0006, forward_head: 0.4487, bptt_initial: 0.0198, bptt_forward_core: 0.8462, bptt_rnn_states: 0.2294, bptt: 1.1992, tail: 0.2761, vtrace: 0.8823, losses: 0.2344, clip: 6.2738, update: 10.0010, after_optimizer: 0.0852, train: 15.5468
[2020-05-09 03:14:53,250][16344] Collected {0: 2015232}, FPS: 46004.4
[2020-05-09 03:14:53,250][16344] Timing: experience: 43.6271

Version V97 (change observation scaling for VizDoom to [0,1] instead of [-1,1])
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v97_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-05-18 20:03:54,747][19288] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.1639, reset: 14.3063, save_policy_outputs: 0.9860, env_step: 35.5967, overhead: 3.7164, complete_rollouts: 0.0164, enqueue_policy_requests: 0.2321, one_step: 0.0152, work: 42.5622
[2020-05-18 20:03:54,757][19289] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 1.2497, reset: 15.2774, save_policy_outputs: 0.9574, env_step: 35.5351, overhead: 3.7291, complete_rollouts: 0.0206, enqueue_policy_requests: 0.2408, one_step: 0.0153, work: 42.4875
[2020-05-18 20:03:54,997][19287] Policy worker avg. requests 6.92, timing: init: 1.8965, wait_policy_total: 13.0253, wait_policy: 0.0051, handle_policy_step: 33.3739, one_step: 0.0052, deserialize: 1.1815, obs_to_device: 4.3630, stack: 11.7451, forward: 11.0673, postprocess: 4.0783, weight_update: 0.0005
[2020-05-18 20:03:55,098][19275] GPU learner timing: extract: 0.1850, buffers: 0.0664, batching: 4.6744, buff_ready: 0.2426, tensors_gpu_float: 1.5915, squeeze: 0.0084, prepare: 6.6508, batcher_mem: 4.5997
[2020-05-18 20:03:55,404][19275] Train loop timing: init: 1.3000, train_wait: 0.4041, epoch_init: 0.0013, minibatch_init: 0.0006, forward_head: 0.4497, bptt_initial: 0.0182, bptt_forward_core: 0.8330, bptt_rnn_states: 0.2242, bptt: 1.1801, tail: 0.2728, vtrace: 0.8815, losses: 0.2466, clip: 6.2452, update: 9.8733, after_optimizer: 0.0879, train: 14.9738
[2020-05-18 20:03:55,558][19245] Collected {0: 2015232}, FPS: 46254.9
[2020-05-18 20:03:55,558][19245] Timing: experience: 43.3909

Version V98 (fast C++ queue is no in pip package)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v98_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-06-03 02:11:53,796][15627] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 1.2446, reset: 12.5371, save_policy_outputs: 1.0480, env_step: 36.0199, overhead: 3.6189, complete_rollouts: 0.0135, enqueue_policy_requests: 0.2274, one_step: 0.0156, work: 42.8733
[2020-06-03 02:11:53,829][15626] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.0752, reset: 13.3548, save_policy_outputs: 0.9588, env_step: 36.1028, overhead: 3.7644, complete_rollouts: 0.0147, enqueue_policy_requests: 0.2207, one_step: 0.0537, work: 43.0754
[2020-06-03 02:11:54,036][15625] Policy worker avg. requests 6.96, timing: init: 1.8361, wait_policy_total: 13.6547, wait_policy: 0.0033, handle_policy_step: 33.5095, one_step: 0.0030, deserialize: 1.2255, obs_to_device: 4.3051, stack: 11.6140, forward: 11.2155, postprocess: 4.1492, weight_update: 0.0006
[2020-06-03 02:11:54,134][15609] GPU learner timing: extract: 0.1874, buffers: 0.0648, batching: 4.6424, buff_ready: 0.2155, tensors_gpu_float: 1.5646, squeeze: 0.0050, prepare: 6.5579, batcher_mem: 4.5717
[2020-06-03 02:11:54,442][15609] Train loop timing: init: 1.3703, train_wait: 0.4377, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.4494, bptt_initial: 0.0188, bptt_forward_core: 0.8428, bptt_rnn_states: 0.2186, bptt: 1.1842, tail: 0.2836, vtrace: 0.9056, losses: 0.2344, clip: 6.1966, update: 9.8436, after_optimizer: 0.0976, train: 14.9817
[2020-06-03 02:11:54,602][15570] Collected {0: 2015232}, FPS: 45932.5
[2020-06-03 02:11:54,602][15570] Timing: experience: 43.6955

Version V100 (refactoring and examples)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v100_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-06-16 23:54:23,254][28570] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.1134, reset: 16.2731, save_policy_outputs: 0.9482, env_step: 36.1469, overhead: 3.6420, complete_rollouts: 0.0160, enqueue_policy_requests: 0.2307, one_step: 0.0155, work: 42.9447
[2020-06-16 23:54:23,259][28571] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 1.1335, reset: 17.3343, save_policy_outputs: 0.9512, env_step: 36.1436, overhead: 3.6017, complete_rollouts: 0.0150, enqueue_policy_requests: 0.2387, one_step: 0.0152, work: 42.9323
[2020-06-16 23:54:23,506][28569] Policy worker avg. requests 6.54, timing: init: 1.8084, wait_policy_total: 14.7985, wait_policy: 0.0051, handle_policy_step: 33.4174, one_step: 0.0023, deserialize: 1.2258, obs_to_device: 4.2974, stack: 11.6298, forward: 11.2171, postprocess: 4.1266, weight_update: 0.0005
[2020-06-16 23:54:23,601][28549] GPU learner timing: extract: 0.1908, buffers: 0.0643, batching: 4.7067, buff_ready: 0.2377, tensors_gpu_float: 1.6369, squeeze: 0.0049, prepare: 6.7185, batcher_mem: 4.6372
[2020-06-16 23:54:23,909][28549] Train loop timing: init: 1.3621, train_wait: 0.3706, epoch_init: 0.0011, minibatch_init: 0.0007, forward_head: 0.4432, bptt_initial: 0.0201, bptt_forward_core: 0.8591, bptt_rnn_states: 0.2214, bptt: 1.2034, tail: 0.2743, vtrace: 0.9086, losses: 0.2391, clip: 6.2629, update: 9.9659, after_optimizer: 0.0921, train: 15.1274
[2020-06-16 23:54:24,086][28510] Collected {0: 2015232}, FPS: 45758.3
[2020-06-16 23:54:24,086][28510] Timing: experience: 43.8618

Version V101 (contiguous tensor in models)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v101_test --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-07-22 20:35:09,474][27996] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 1.1467, reset: 15.6146, save_policy_outputs: 0.9717, env_step: 36.1261, overhead: 3.7894, complete_rollouts: 0.0151, enqueue_policy_requests: 0.2367, one_step: 0.0153, work: 43.0991
[2020-07-22 20:35:09,498][27994] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0128, waiting: 1.1866, reset: 17.1803, save_policy_outputs: 0.9691, env_step: 36.1947, overhead: 3.7123, complete_rollouts: 0.0164, enqueue_policy_requests: 0.2368, one_step: 0.0474, work: 43.0848
[2020-07-22 20:35:09,718][27993] Policy worker avg. requests 6.86, timing: init: 1.8715, wait_policy_total: 14.2155, wait_policy: 0.0051, handle_policy_step: 33.5834, one_step: 0.0035, deserialize: 1.1937, obs_to_device: 4.3637, stack: 11.8305, forward: 11.1642, postprocess: 4.1322, weight_update: 0.0004
[2020-07-22 20:35:09,811][27976] GPU learner timing: extract: 0.1896, buffers: 0.0666, batching: 4.6337, buff_ready: 0.2378, tensors_gpu_float: 1.6148, squeeze: 0.0049, prepare: 6.6185, batcher_mem: 4.5605
[2020-07-22 20:35:10,119][27976] Train loop timing: init: 1.3278, train_wait: 0.4263, epoch_init: 0.0011, minibatch_init: 0.0006, forward_head: 0.4501, bptt_initial: 0.0177, bptt_forward_core: 0.8301, bptt_rnn_states: 0.2375, bptt: 1.1895, tail: 0.2771, vtrace: 0.8882, losses: 0.2363, clip: 6.2543, update: 9.8949, after_optimizer: 0.0802, train: 15.0254
[2020-07-22 20:35:10,285][27940] Collected {0: 2015232}, FPS: 45531.6
[2020-07-22 20:35:10,285][27940] Timing: experience: 44.0802

Version V103 (Packed RNN faster learner)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v103 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-08-08 03:02:50,393][07993] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 0.8809, reset: 13.7800, save_policy_outputs: 0.9519, env_step: 35.9197, overhead: 3.5910, complete_rollouts: 0.0153, enqueue_policy_requests: 0.2150, one_step: 0.0156, work: 42.6455
[2020-08-08 03:02:50,437][07994] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 0.8950, reset: 12.1745, save_policy_outputs: 0.9383, env_step: 35.9078, overhead: 3.6172, complete_rollouts: 0.0166, enqueue_policy_requests: 0.1953, one_step: 0.0211, work: 42.6833
[2020-08-08 03:02:50,667][07992] Policy worker avg. requests 6.78, timing: init: 1.8202, wait_policy_total: 13.1564, wait_policy: 0.0051, handle_policy_step: 34.4273, one_step: 0.0000, deserialize: 1.1965, obs_to_device: 4.3501, stack: 11.6018, forward: 12.6697, postprocess: 4.1192, weight_update: 0.0009
[2020-08-08 03:02:50,759][07978] GPU learner timing: extract: 0.1866, buffers: 0.0659, batching: 4.7246, buff_ready: 0.2447, tensors_gpu_float: 1.6641, squeeze: 0.0072, prepare: 6.7815, batcher_mem: 4.6340
[2020-08-08 03:02:51,065][07978] Train loop timing: init: 1.3798, train_wait: 0.3100, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.3028, bptt_initial: 2.6581, bptt_forward_core: 0.3173, bptt: 0.3282, tail: 0.2402, vtrace: 0.8758, losses: 0.2186, clip: 6.4882, update: 8.3101, after_optimizer: 0.0808, train: 13.3027
[2020-08-08 03:02:51,202][07943] Collected {0: 2015232}, FPS: 46428.6
[2020-08-08 03:02:51,202][07943] Timing: experience: 43.2286

Version V104 (Fixed packed RNN bugs)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v104 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-08-09 04:11:46,256][18236] Env runner 0, CPU aff. [0], rollouts 760: timing wait_actor: 0.0000, waiting: 0.9439, reset: 13.9935, save_policy_outputs: 0.9516, env_step: 36.4072, overhead: 3.4867, complete_rollouts: 0.0154, enqueue_policy_requests: 0.2004, one_step: 0.0155, work: 42.9089
[2020-08-09 04:11:46,258][18237] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 0.9304, reset: 17.0426, save_policy_outputs: 1.0489, env_step: 35.9606, overhead: 3.6797, complete_rollouts: 0.0158, enqueue_policy_requests: 0.2029, one_step: 0.0150, work: 42.9334
[2020-08-09 04:11:46,516][18235] Policy worker avg. requests 6.98, timing: init: 1.8550, wait_policy_total: 13.0789, wait_policy: 0.0051, handle_policy_step: 34.3765, one_step: 0.0029, deserialize: 1.2016, obs_to_device: 4.2958, stack: 11.5534, forward: 12.6304, postprocess: 4.1815, weight_update: 0.0005
[2020-08-09 04:11:46,613][18222] GPU learner timing: extract: 0.1901, buffers: 0.0659, batching: 4.6662, buff_ready: 0.2383, tensors_gpu_float: 1.7760, squeeze: 0.0056, prepare: 6.8164, batcher_mem: 4.5960
[2020-08-09 04:11:46,919][18222] Train loop timing: init: 1.4008, train_wait: 0.3700, epoch_init: 0.0011, minibatch_init: 0.0005, forward_head: 0.3178, bptt_initial: 2.6194, bptt_forward_core: 0.3159, bptt: 0.3270, tail: 0.2507, vtrace: 0.8709, losses: 0.2520, clip: 6.6023, update: 8.4175, after_optimizer: 0.1045, train: 13.4457
[2020-08-09 04:11:47,066][18187] Workers joined!
[2020-08-09 04:11:47,077][18187] Collected {0: 2015232}, FPS: 46128.0
[2020-08-09 04:11:47,077][18187] Timing: experience: 43.5102

Version V105 (KL-divergence exploration)
[2020-08-14 01:03:19,185][28163] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.0017, reset: 14.2128, save_policy_outputs: 0.9824, env_step: 36.4462, overhead: 3.6529, complete_rollouts: 0.0160, enqueue_policy_requests: 0.1855, one_step: 0.0158, work: 43.2764
[2020-08-14 01:03:19,215][28164] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 1.0480, reset: 12.1188, save_policy_outputs: 0.9910, env_step: 36.3682, overhead: 3.6898, complete_rollouts: 0.0153, enqueue_policy_requests: 0.1810, one_step: 0.0188, work: 43.2639
[2020-08-14 01:03:19,450][28162] Policy worker avg. requests 6.36, timing: init: 2.0099, wait_policy_total: 13.3693, wait_policy: 0.0051, handle_policy_step: 34.6174, one_step: 0.0039, deserialize: 1.2006, obs_to_device: 4.3295, stack: 11.5226, forward: 12.8290, to_cpu: 2.7540, format_outputs: 1.1375, postprocess: 4.2561, weight_update: 0.0005
[2020-08-14 01:03:19,544][28148] GPU learner timing: extract: 0.1886, buffers: 0.0676, batching: 4.6875, buff_ready: 0.2238, tensors_gpu_float: 1.6100, squeeze: 0.0056, prepare: 6.6578, batcher_mem: 4.6076
[2020-08-14 01:03:19,851][28148] Train loop timing: init: 1.4439, train_wait: 0.3581, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.3226, bptt_initial: 2.6679, bptt_forward_core: 0.3080, bptt: 0.3193, tail: 0.2493, vtrace: 0.8909, losses: 0.2339, clip: 6.4879, update: 8.2991, after_optimizer: 0.0837, train: 13.3566
[2020-08-14 01:03:20,017][28113] Collected {0: 2015232}, FPS: 45756.6
[2020-08-14 01:03:20,017][28113] Timing: experience: 43.8634

Version V107 (faster categorical distributions)
[2020-09-11 19:10:38,111][01603] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 0.9135, reset: 14.1597, save_policy_outputs: 0.9308, env_step: 36.0150, overhead: 3.6331, prepare_next_step: 1.3416, complete_rollouts: 0.0140, enqueue_policy_requests: 0.2695, one_step: 0.0153, work: 42.8347
[2020-09-11 19:10:38,112][01604] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 0.8913, reset: 14.5501, save_policy_outputs: 0.9838, env_step: 35.9917, overhead: 3.6282, prepare_next_step: 1.4316, complete_rollouts: 0.0131, enqueue_policy_requests: 0.2362, one_step: 0.0151, work: 42.8679
[2020-09-11 19:10:38,387][01602] Policy worker avg. requests 6.68, timing: init: 1.8235, wait_policy_total: 14.9073, wait_policy: 0.0051, handle_policy_step: 32.2519, one_step: 0.0000, deserialize: 1.2188, obs_to_device: 4.3302, stack: 11.4334, forward: 10.4606, to_cpu: 3.0993, format_outputs: 1.1230, postprocess: 3.9854, weight_update: 0.0006
[2020-09-11 19:10:38,475][01588] GPU learner timing: extract: 0.1846, buffers: 0.0656, batching: 4.6244, buff_ready: 0.2338, tensors_gpu_float: 1.6586, squeeze: 0.0055, prepare: 6.6505, batcher_mem: 4.5485
[2020-09-11 19:10:38,781][01588] Train loop timing: init: 1.4011, train_wait: 0.4821, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.3063, bptt_initial: 2.6546, bptt_forward_core: 0.3205, bptt: 0.3322, tail: 0.1785, vtrace: 0.8624, losses: 0.2237, clip: 6.5730, update: 8.3336, after_optimizer: 0.0781, train: 13.3056
[2020-09-11 19:10:38,922][01534] Collected {0: 2015232}, FPS: 46145.4
[2020-09-11 19:10:38,922][01534] Timing: experience: 43.4938

Version V108 (fixed entropy bug)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v108 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-09-13 00:48:54,254][25313] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 0.7495, reset: 14.3644, save_policy_outputs: 0.9754, env_step: 36.1489, overhead: 3.6771, prepare_next_step: 1.3667, complete_rollouts: 0.0138, enqueue_policy_requests: 0.2293, one_step: 0.0152, work: 43.0335
[2020-09-13 00:48:54,269][25314] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 0.8097, reset: 13.5597, save_policy_outputs: 0.9443, env_step: 36.1325, overhead: 3.6884, prepare_next_step: 1.3883, complete_rollouts: 0.0264, enqueue_policy_requests: 0.2110, one_step: 0.0152, work: 42.9949
[2020-09-13 00:48:54,516][25312] Policy worker avg. requests 6.54, timing: init: 2.0546, wait_policy_total: 14.6246, wait_policy: 0.0051, handle_policy_step: 32.4529, one_step: 0.0033, deserialize: 1.2098, obs_to_device: 4.3389, stack: 11.5308, forward: 10.4533, to_cpu: 3.1808, format_outputs: 1.0879, postprocess: 4.0375, weight_update: 0.0006
[2020-09-13 00:48:54,618][25298] GPU learner timing: extract: 0.1841, buffers: 0.0643, batching: 4.6550, buff_ready: 0.2414, tensors_gpu_float: 1.7010, squeeze: 0.0056, prepare: 6.7290, batcher_mem: 4.5854
[2020-09-13 00:48:54,924][25298] Train loop timing: init: 1.4083, train_wait: 0.5370, epoch_init: 0.0012, minibatch_init: 0.0005, forward_head: 0.3111, bptt_initial: 2.6444, bptt_forward_core: 0.3150, bptt: 0.3247, tail: 0.1675, vtrace: 0.9004, losses: 0.3488, clip: 6.5036, update: 8.3128, after_optimizer: 0.0848, train: 13.4505
[2020-09-13 00:48:55,081][25263] Collected {0: 2015232}, FPS: 46279.0
[2020-09-13 00:48:55,081][25263] Timing: experience: 43.3682

Version V109 (Pytorch 1.6.0 is the default environment)
[2020-09-16 01:15:07,453][13302] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 1.1626, reset: 12.3288, save_policy_outputs: 0.9347, env_step: 34.4003, overhead: 3.1593, prepare_next_step: 1.3322, complete_rollouts: 0.0123, enqueue_policy_requests: 0.1939, one_step: 0.0128, work: 40.6330
[2020-09-16 01:15:07,474][13303] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 1.1050, reset: 16.1093, save_policy_outputs: 0.9673, env_step: 34.5213, overhead: 3.1302, prepare_next_step: 1.3112, complete_rollouts: 0.0143, enqueue_policy_requests: 0.1764, one_step: 0.0170, work: 40.6930
[2020-09-16 01:15:07,707][13301] Policy worker avg. requests 7.26, timing: init: 1.8608, wait_policy_total: 12.0739, wait_policy: 0.0051, handle_policy_step: 33.7882, one_step: 0.0054, deserialize: 1.1357, stack: 6.4923, obs_to_device: 6.8896, forward: 10.3463, to_cpu: 2.7757, format_outputs: 0.9618, postprocess: 4.1559, weight_update: 0.0006
[2020-09-16 01:15:07,814][13285] GPU learner timing: extract: 0.1725, buffers: 0.0586, batching: 4.7156, buff_ready: 0.2059, tensors_gpu_float: 1.7596, squeeze: 0.0059, prepare: 6.8247, batcher_mem: 4.6170
[2020-09-16 01:15:08,120][13285] Train loop timing: init: 1.5387, train_wait: 0.2935, epoch_init: 0.0011, minibatch_init: 0.0005, forward_head: 0.2723, bptt_initial: 2.6277, bptt_forward_core: 0.3213, bptt: 0.3298, tail: 0.1691, vtrace: 0.8107, losses: 0.3124, clip: 5.3533, update: 7.1089, after_optimizer: 0.1429, train: 12.1383
[2020-09-16 01:15:08,275][13253] Workers joined!
[2020-09-16 01:15:08,284][13253] Collected {0: 2015232}, FPS: 48412.1
[2020-09-16 01:15:08,284][13253] Timing: experience: 41.4574

Version V110 (Faster standard conv encoder via torch.jit.script)
Version V111 (Slightly better decorrelation)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v111 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-09-23 01:24:02,087][22532] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 1.1046, reset: 20.0966, save_policy_outputs: 0.9101, env_step: 34.7202, overhead: 3.1737, prepare_next_step: 1.4131, complete_rollouts: 0.0121, enqueue_policy_requests: 0.1999, one_step: 0.0146, work: 41.0033
[2020-09-23 01:24:02,134][22533] Env runner 1, CPU aff. [1], rollouts 760: timing wait_actor: 0.0000, waiting: 1.2292, reset: 18.7463, save_policy_outputs: 0.9152, env_step: 34.7027, overhead: 3.1941, prepare_next_step: 1.3348, complete_rollouts: 0.0233, enqueue_policy_requests: 0.1900, one_step: 0.0160, work: 40.9426
[2020-09-23 01:24:02,363][22531] Policy worker avg. requests 6.60, timing: init: 2.1233, wait_policy_total: 11.8183, wait_policy: 0.0051, handle_policy_step: 33.6310, one_step: 0.0058, deserialize: 1.1589, stack: 6.6629, obs_to_device: 7.2708, forward: 9.6552, to_cpu: 2.7666, format_outputs: 0.9241, postprocess: 4.1301, weight_update: 0.0006
[2020-09-23 01:24:02,455][22517] GPU learner timing: extract: 0.1733, buffers: 0.0578, batching: 4.6974, buff_ready: 0.2141, tensors_gpu_float: 1.6717, squeeze: 0.0045, prepare: 6.7141, batcher_mem: 4.6000
[2020-09-23 01:24:02,761][22517] Train loop timing: init: 1.5554, train_wait: 0.2996, epoch_init: 0.0011, minibatch_init: 0.0006, forward_head: 0.2578, bptt_initial: 2.6107, bptt_forward_core: 0.3030, bptt: 0.3122, tail: 0.1620, vtrace: 0.7612, losses: 0.2967, clip: 5.2707, update: 7.0722, after_optimizer: 0.2412, train: 12.0869
[2020-09-23 01:24:02,903][22484] Collected {0: 2015232}, FPS: 48082.7
[2020-09-23 01:24:02,903][22484] Timing: experience: 41.7414

Version V112 (refactored reward shaping functionality)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v112 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2020-09-29 01:49:41,866][26877] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 1.3280, reset: 16.1238, save_policy_outputs: 0.9344, env_step: 34.4164, overhead: 3.1728, prepare_next_step: 1.3545, complete_rollouts: 0.0209, enqueue_policy_requests: 0.2053, one_step: 0.0145, work: 40.6823
[2020-09-29 01:49:41,876][26878] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 1.2731, reset: 18.6014, save_policy_outputs: 0.9376, env_step: 34.4109, overhead: 3.2620, prepare_next_step: 1.3230, complete_rollouts: 0.0129, enqueue_policy_requests: 0.1906, one_step: 0.0148, work: 40.7327
[2020-09-29 01:49:42,131][26876] Policy worker avg. requests 6.94, timing: init: 1.9720, wait_policy_total: 12.1358, wait_policy: 0.0051, handle_policy_step: 33.2848, one_step: 0.0020, deserialize: 1.1178, stack: 6.4779, obs_to_device: 7.1846, forward: 9.6816, to_cpu: 2.7093, format_outputs: 0.9350, postprocess: 4.1313, weight_update: 0.0006
[2020-09-29 01:49:42,223][26859] GPU learner timing: extract: 0.1646, buffers: 0.0582, batching: 4.5765, buff_ready: 0.2025, tensors_gpu_float: 1.7024, squeeze: 0.0043, prepare: 6.6163, batcher_mem: 4.5062
[2020-09-29 01:49:42,529][26859] Train loop timing: init: 1.5386, train_wait: 0.3641, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.2523, bptt_initial: 2.6347, bptt_forward_core: 0.3256, bptt: 0.3362, tail: 0.1735, vtrace: 0.8181, losses: 0.3091, clip: 5.2569, update: 7.0986, after_optimizer: 0.1639, train: 12.1318
[2020-09-29 01:49:42,694][26826] Collected {0: 2015232}, FPS: 48289.6
[2020-09-29 01:49:42,695][26826] Timing: experience: 41.5626

Version V113 (multi-layer RNN support)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v113 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2021-01-20 15:19:09,671][21353] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 1.2295, reset: 18.4479, save_policy_outputs: 0.9165, env_step: 34.7132, overhead: 3.0937, prepare_next_step: 1.3009, complete_rollouts: 0.0119, enqueue_policy_requests: 0.1892, one_step: 0.0144, work: 40.8158
[2021-01-20 15:19:09,678][21352] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.1869, reset: 19.3426, save_policy_outputs: 0.9345, env_step: 34.7421, overhead: 3.1023, prepare_next_step: 1.2818, complete_rollouts: 0.0288, enqueue_policy_requests: 0.1869, one_step: 0.0340, work: 40.8949
[2021-01-20 15:19:09,935][21351] Policy worker avg. requests 7.26, timing: init: 1.8878, wait_policy_total: 12.2714, wait_policy: 0.0051, handle_policy_step: 33.8281, one_step: 0.0000, deserialize: 1.1574, stack: 6.7264, obs_to_device: 7.2523, forward: 9.7524, to_cpu: 2.7330, format_outputs: 0.9939, postprocess: 4.1549, weight_update: 0.0007
[2021-01-20 15:19:10,020][21337] GPU learner timing: extract: 0.1688, buffers: 0.0580, batching: 4.6705, buff_ready: 0.2132, tensors_gpu_float: 1.5787, squeeze: 0.0072, prepare: 6.5977, batcher_mem: 4.5967
[2021-01-20 15:19:10,326][21337] Train loop timing: init: 1.4870, train_wait: 0.4033, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.2437, bptt_initial: 2.6465, bptt_forward_core: 0.2964, bptt: 0.3046, tail: 0.1642, vtrace: 0.7640, losses: 0.3053, clip: 5.2748, update: 7.0887, after_optimizer: 0.2019, train: 12.0959
[2021-01-20 15:19:10,490][21304] Collected {0: 2015232}, FPS: 48426.0
[2021-01-20 15:19:10,490][21304] Timing: experience: 41.4455

Version V115 (added CPC|A)
python -m algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v115 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2021-01-27 00:53:20,314][32001] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 1.2012, reset: 17.7077, save_policy_outputs: 0.9435, env_step: 34.6930, overhead: 3.1826, prepare_next_step: 1.3126, complete_rollouts: 0.0135, enqueue_policy_requests: 0.1804, one_step: 0.0155, work: 40.8831
[2021-01-27 00:53:20,367][32000] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.0936, reset: 16.2466, save_policy_outputs: 0.9469, env_step: 34.6927, overhead: 3.2715, prepare_next_step: 1.3379, complete_rollouts: 0.0151, enqueue_policy_requests: 0.1740, one_step: 0.0335, work: 41.0321
[2021-01-27 00:53:20,553][31999] Policy worker avg. requests 8.24, timing: init: 1.9279, wait_policy_total: 12.4854, wait_policy: 0.0051, handle_policy_step: 33.5467, one_step: 0.0039, deserialize: 1.1213, stack: 6.6523, obs_to_device: 7.0963, forward: 9.7408, to_cpu: 2.7980, format_outputs: 0.9287, postprocess: 4.1197, weight_update: 0.0007
[2021-01-27 00:53:20,657][31983] GPU learner timing: extract: 0.1711, buffers: 0.0588, batching: 4.6699, buff_ready: 0.2086, tensors_gpu_float: 1.6051, squeeze: 0.0060, prepare: 6.6216, batcher_mem: 4.5712
[2021-01-27 00:53:20,963][31983] Train loop timing: init: 1.4324, train_wait: 0.5063, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.2521, bptt_initial: 2.6369, bptt_forward_core: 0.3000, bptt: 0.3113, tail: 0.1638, vtrace: 0.8323, losses: 0.3192, clip: 5.2483, update: 7.1884, after_optimizer: 0.1826, train: 12.2588
[2021-01-27 00:53:21,129][31950] Collected {0: 2023424}, FPS: 48158.9
[2021-01-27 00:53:21,129][31950] Timing: experience: 41.6754

Version V116 (added pip package)
python -m sample_factory.algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v116 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2021-03-29 18:42:15,364][18268] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.2556, reset: 18.4990, save_policy_outputs: 0.9195, env_step: 34.6478, overhead: 3.0959, prepare_next_step: 1.3266, complete_rollouts: 0.0167, enqueue_policy_requests: 0.1889, one_step: 0.0145, work: 40.7663
[2021-03-29 18:42:15,380][18269] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 1.3308, reset: 17.0637, save_policy_outputs: 0.9361, env_step: 34.5326, overhead: 3.1220, prepare_next_step: 1.3382, complete_rollouts: 0.0138, enqueue_policy_requests: 0.1837, one_step: 0.0208, work: 40.7161
[2021-03-29 18:42:15,631][18267] Policy worker avg. requests 7.50, timing: init: 2.0053, wait_policy_total: 11.4973, wait_policy: 0.0051, handle_policy_step: 33.6328, one_step: 0.0055, deserialize: 1.1427, stack: 6.7386, obs_to_device: 7.2415, forward: 9.6114, to_cpu: 2.7343, format_outputs: 0.9479, postprocess: 4.1232, weight_update: 0.0007
[2021-03-29 18:42:15,736][18257] GPU learner timing: extract: 0.1719, buffers: 0.0588, batching: 4.7083, buff_ready: 0.2081, tensors_gpu_float: 1.7109, squeeze: 0.0045, prepare: 6.7697, batcher_mem: 4.6271
[2021-03-29 18:42:16,042][18257] Train loop timing: init: 1.4496, train_wait: 0.2670, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.2451, bptt_initial: 2.6272, bptt_forward_core: 0.3021, bptt: 0.3137, tail: 0.1655, vtrace: 0.8071, losses: 0.3095, clip: 5.3143, update: 7.1355, after_optimizer: 0.1482, train: 12.1167
[2021-03-29 18:42:16,219][18230] Collected {0: 2015232}, FPS: 48368.2
[2021-03-29 18:42:16,219][18230] Timing: experience: 41.4950

Version V1.118.0 (after Erik's numpy arrays)
python -m sample_factory.algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v1.118.0 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2021-05-06 01:50:12,126][23502] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 1.0902, reset: 16.0942, save_policy_outputs: 0.9888, env_step: 34.3452, overhead: 2.1221, prepare_next_step: 0.6871, complete_rollouts: 0.0116, enqueue_policy_requests: 0.2783, one_step: 0.0198, work: 38.9687
[2021-05-06 01:50:12,179][23503] Env runner 1, CPU aff. [1], rollouts 820: timing wait_actor: 0.0000, waiting: 1.0612, reset: 15.6712, save_policy_outputs: 0.9904, env_step: 34.3559, overhead: 2.2192, prepare_next_step: 0.6915, complete_rollouts: 0.0104, enqueue_policy_requests: 0.2212, one_step: 0.0304, work: 39.0570
[2021-05-06 01:50:12,379][23501] Policy worker avg. requests 6.84, timing: init: 1.9790, wait_policy_total: 13.8464, wait_policy: 0.0051, handle_policy_step: 29.9433, one_step: 0.0034, deserialize: 4.2096, stack: 0.2521, obs_to_device: 7.0503, forward: 10.2916, to_cpu: 3.1488, format_outputs: 1.0241, postprocess: 2.6734, weight_update: 0.0007
[2021-05-06 01:50:12,480][23490] GPU learner timing: extract: 0.1847, buffers: 0.0589, batching: 5.6264, buff_ready: 0.0398, tensors_gpu_float: 1.8703, squeeze: 0.0052, prepare: 7.6566, batcher_mem: 5.5381
[2021-05-06 01:50:12,786][23490] Train loop timing: init: 1.6372, train_wait: 0.3602, epoch_init: 0.0013, minibatch_init: 0.0007, forward_head: 0.2614, bptt_initial: 2.6642, bptt_forward_core: 0.3164, bptt: 0.3265, tail: 0.1797, vtrace: 0.8300, losses: 0.3164, clip: 5.4153, update: 7.2694, after_optimizer: 0.1417, train: 12.3422
[2021-05-06 01:50:12,895][23462] Collected {0: 2015232}, FPS: 50425.6
[2021-05-06 01:50:12,895][23462] Timing: experience: 39.8020

Version V1.119.0 (Using a queue of tensor indices instead of indexing shared buffers)
python -m sample_factory.algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v1.119.0 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2021-05-13 03:30:25,321][06125] Env runner 0, CPU aff. [0], rollouts 820: timing wait_actor: 0.0000, waiting: 0.7434, reset: 20.7285, save_policy_outputs: 1.0361, env_step: 34.5348, overhead: 2.2231, prepare_next_step: 0.8679, complete_rollouts: 0.0110, enqueue_policy_requests: 0.2217, one_step: 0.0133, work: 39.4845, wait_buffers: 0.0036
[2021-05-13 03:30:25,334][06126] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 0.6670, reset: 19.5223, save_policy_outputs: 1.0255, env_step: 34.5603, overhead: 2.1624, prepare_next_step: 0.8350, complete_rollouts: 0.0111, enqueue_policy_requests: 0.2856, one_step: 0.0223, work: 39.5636, wait_buffers: 0.0927
[2021-05-13 03:30:25,575][06124] Policy worker avg. requests 7.46, timing: init: 1.9440, wait_policy_total: 14.5109, wait_policy: 0.0051, handle_policy_step: 30.0012, one_step: 0.0041, deserialize: 3.8368, stack: 0.2607, obs_to_device: 7.2633, forward: 10.2820, to_cpu: 3.1964, format_outputs: 1.0618, postprocess: 2.8768, weight_update: 0.0008
[2021-05-13 03:30:25,678][06110] GPU learner timing: extract: 0.1649, buffers: 0.0584, batching: 5.5548, buff_ready: 0.1822, tensors_gpu_float: 1.6805, squeeze: 0.0063, prepare: 7.5435, batcher_mem: 5.4959
[2021-05-13 03:30:25,985][06110] Train loop timing: init: 1.4478, train_wait: 0.3855, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.2491, bptt_initial: 2.6962, bptt_forward_core: 0.3086, bptt: 0.3185, tail: 0.1938, vtrace: 0.8287, losses: 0.3403, clip: 5.4670, update: 7.3178, after_optimizer: 0.1435, train: 12.4720
[2021-05-13 03:30:26,114][06076] Collected {0: 2015232}, FPS: 50316.0
[2021-05-13 03:30:26,114][06076] Timing: experience: 39.8887

Version V1.120.0 (added inactive agents)
python -m sample_factory.algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v1.120.0 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2021-05-25 21:15:37,914][07565] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 1.0482, reset: 18.5269, split_output_tensors: 0.5974, save_policy_outputs: 1.1157, env_step: 34.0636, overhead: 2.2507, prepare_next_step: 0.8862, complete_rollouts: 0.0108, enqueue_policy_requests: 0.2815, one_step: 0.0142, work: 39.1658, wait_buffers: 0.0037
[2021-05-25 21:15:37,915][07564] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 0.9832, reset: 18.3495, split_output_tensors: 0.5992, save_policy_outputs: 1.1161, env_step: 34.0988, overhead: 2.2383, prepare_next_step: 0.9262, complete_rollouts: 0.0104, enqueue_policy_requests: 0.2587, one_step: 0.0142, work: 39.2207, wait_buffers: 0.0039
[2021-05-25 21:15:38,172][07563] Policy worker avg. requests 6.54, timing: init: 1.9728, wait_policy_total: 14.5304, wait_policy: 0.0051, handle_policy_step: 29.3400, one_step: 0.0000, deserialize: 3.8221, stack: 0.2594, obs_to_device: 6.9816, forward: 10.0754, to_cpu: 3.1540, format_outputs: 1.0242, postprocess: 2.8354, weight_update: 0.0006
[2021-05-25 21:15:38,257][07552] GPU learner timing: extract: 0.1747, buffers: 0.0607, batching: 5.8530, buff_ready: 0.1862, tensors_gpu_float: 1.8155, squeeze: 0.0054, prepare: 7.9784, batcher_mem: 5.7895
[2021-05-25 21:15:38,563][07552] Train loop timing: init: 1.3410, train_wait: 0.3794, epoch_init: 0.0012, minibatch_init: 0.0006, forward_head: 0.2626, bptt_initial: 2.6464, bptt_forward_core: 0.2900, bptt: 0.3029, tail: 0.1830, vtrace: 0.8645, losses: 0.4063, clip: 5.4409, update: 7.4087, after_optimizer: 0.1547, train: 12.6252
[2021-05-25 21:15:38,658][07524] Workers joined!
[2021-05-25 21:15:38,669][07524] Collected {0: 2015232}, FPS: 50412.1
[2021-05-25 21:15:38,669][07524] Timing: experience: 39.8127
WARNING: result on an empty system after reboot

Version V1.121.0 (added KL penalty)
Vizdoom 1.1.9! Pytorch 1.10. Python 3.9
python -m sample_factory.algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v1.121.0 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2021-11-08 21:25:02,543][19121] Env runner 1, CPU aff. [1], rollouts 780: timing wait_actor: 0.0000, waiting: 0.7572, reset: 17.8838, split_output_tensors: 0.6583, save_policy_outputs: 1.2270, env_step: 39.2074, overhead: 2.3631, prepare_next_step: 0.9331, complete_rollouts: 0.0129, enqueue_policy_requests: 0.2599, one_step: 0.0168, work: 44.6671, wait_buffers: 0.0255
[2021-11-08 21:25:02,605][19120] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 0.7435, reset: 18.5829, split_output_tensors: 0.6604, save_policy_outputs: 1.2546, env_step: 39.3314, overhead: 2.2430, prepare_next_step: 0.9688, complete_rollouts: 0.0107, enqueue_policy_requests: 0.2962, one_step: 0.0460, work: 44.7243, wait_buffers: 0.0040
[2021-11-08 21:25:02,790][19119] Policy worker avg. requests 6.84, timing: init: 3.3958, wait_policy_total: 16.7598, wait_policy: 0.0051, handle_policy_step: 31.1682, one_step: 0.0048, deserialize: 3.7056, stack: 0.2014, obs_to_device: 7.4448, forward: 11.5069, to_cpu: 3.1898, format_outputs: 1.0081, postprocess: 2.8041, weight_update: 0.0008
[2021-11-08 21:25:02,883][19071] GPU learner timing: extract: 0.2006, buffers: 0.0649, batching: 5.4289, buff_ready: 0.1964, tensors_gpu_float: 1.6935, squeeze: 0.0110, prepare: 7.4766, batcher_mem: 5.3692
[2021-11-08 21:25:03,189][19071] Train loop timing: init: 2.4280, train_wait: 0.3425, epoch_init: 0.0011, minibatch_init: 0.0007, forward_head: 0.3226, bptt_initial: 2.6329, bptt_forward_core: 0.2267, bptt: 0.2368, tail: 0.1528, vtrace: 0.6831, losses: 0.3713, clip: 0.2814, update: 2.0685, after_optimizer: 0.3953, train: 7.3341
[2021-11-08 21:25:03,314][19045] Collected {0: 2015232}, FPS: 44707.0
[2021-11-08 21:25:03,314][19045] Timing: experience: 44.8932

Version V1.121.4
Vizdoom 1.1.11, Pytorch 1.10, Python 3.9, Ubuntu 20
python -m sample_factory.algorithms.appo.train_appo --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=20 --num_envs_per_worker=20 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_v1.121.4 --benchmark=True --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2
[2022-01-09 05:20:27,008][184145] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 0.6231, reset: 15.7217, split_output_tensors: 0.6470, save_policy_outputs: 1.1979, env_step: 35.7948, overhead: 2.4081, prepare_next_step: 0.9012, complete_rollouts: 0.0135, enqueue_policy_requests: 0.2657, one_step: 0.0145, work: 41.2160, wait_buffers: 0.0038
[2022-01-09 05:20:27,017][184144] Env runner 0, CPU aff. [0], rollouts 800: timing wait_actor: 0.0000, waiting: 0.6793, reset: 17.8522, split_output_tensors: 0.6255, save_policy_outputs: 1.1636, env_step: 35.8543, overhead: 2.3278, prepare_next_step: 0.8789, complete_rollouts: 0.0110, enqueue_policy_requests: 0.3222, one_step: 0.0182, work: 41.1648, wait_buffers: 0.0039
[2022-01-09 05:20:27,264][184143] Policy worker avg. requests 6.32, timing: init: 2.7602, wait_policy_total: 15.4553, wait_policy: 0.0052, handle_policy_step: 29.2352, one_step: 0.0025, deserialize: 3.5863, stack: 0.2062, obs_to_device: 7.6138, forward: 9.5064, to_cpu: 3.3929, format_outputs: 0.9884, postprocess: 2.5917, weight_update: 0.0006
[2022-01-09 05:20:27,366][184116] GPU learner timing: extract: 0.1750, buffers: 0.0567, batching: 5.2542, buff_ready: 0.1724, tensors_gpu_float: 1.6330, squeeze: 0.0064, prepare: 7.1875, batcher_mem: 5.1569
[2022-01-09 05:20:27,673][184116] Train loop timing: init: 1.9581, train_wait: 0.3825, epoch_init: 0.0011, minibatch_init: 0.0006, forward_head: 0.2330, bptt_initial: 2.9500, bptt_forward_core: 0.2780, bptt: 0.2895, tail: 0.1421, vtrace: 0.6693, losses: 0.4309, clip: 0.2765, update: 1.8503, after_optimizer: 0.3446, train: 7.3772
[2022-01-09 05:20:27,810][184052] Collected {0: 2015232}, FPS: 48606.9
[2022-01-09 05:20:27,811][184052] Timing: experience: 41.2913

Version V1.123.0
[2022-07-28 15:11:45,788][12501] Env runner 0, CPU aff. [0], rollouts 780: timing wait_actor: 0.0000, waiting: 0.5988, reset: 13.9294, split_output_tensors: 0.6392, save_policy_outputs: 1.2425, env_step: 38.6817, overhead: 2.4065, prepare_next_step: 1.0044, complete_rollouts: 0.0112, enqueue_policy_requests: 0.2773, one_step: 0.0159, work: 44.2826, wait_buffers: 0.0047
[2022-07-28 15:11:45,807][12502] Env runner 1, CPU aff. [1], rollouts 800: timing wait_actor: 0.0000, waiting: 0.6044, reset: 17.3649, split_output_tensors: 0.6319, save_policy_outputs: 1.2282, env_step: 38.8179, overhead: 2.3459, prepare_next_step: 0.9571, complete_rollouts: 0.0121, enqueue_policy_requests: 0.2495, one_step: 0.0199, work: 44.2758, wait_buffers: 0.0047
[2022-07-28 15:11:46,051][12500] Policy worker avg. requests 6.20, timing: init: 2.6750, wait_policy_total: 19.7717, wait_policy: 0.0051, handle_policy_step: 28.9186, one_step: 0.0026, deserialize: 3.5246, stack: 0.2156, obs_to_device: 7.1312, forward: 9.8601, to_cpu: 3.1674, format_outputs: 1.0316, postprocess: 2.6705, weight_update: 0.0007
[2022-07-28 15:11:46,144][12474] GPU learner timing: extract: 0.1809, buffers: 0.0660, buffer_stack_and_squeeze: 4.6107, batching: 4.2582, buff_ready: 0.1894, tensors_gpu_float: 1.4472, prepare: 10.7176, batcher_mem: 4.1866
[2022-07-28 15:11:46,450][12474] Train loop timing: init: 1.6059, train_wait: 0.4062, epoch_init: 0.0012, minibatch_init: 0.0007, forward_head: 0.2276, bptt_initial: 2.7156, bptt_forward_core: 0.2942, bptt: 0.3058, tail: 0.1498, vtrace: 0.8308, losses: 0.4001, kl_divergence: 0.0523, clip: 0.2922, update: 1.9485, after_optimizer: 0.3269, train: 7.3532
[2022-07-28 15:11:46,573][12413] Collected {0: 2015232}, FPS: 45141.6
[2022-07-28 15:11:46,574][12413] Timing: experience: 44.4610

Version v2.0.0.v68
python -m sf_examples.vizdoom_examples.train_vizdoom --env=doom_benchmark --algo=APPO --env_frameskip=4 --train_for_env_steps=4000000 --use_rnn=True --num_workers=20 --num_envs_per_worker=16 --num_policies=1 --num_epochs=1 --rollout=32 --recurrence=32 --batch_size=2048 --experiment=doom_battle_appo_sf2_v68 --benchmark=True --decorrelate_envs_on_one_worker=False --res_w=128 --res_h=72 --wide_aspect_ratio=True --policy_workers_per_policy=1 --worker_num_splits=2 --batched_sampling=False --serial_mode=False --async_rl=True --policy_workers_per_policy=1
[2022-07-28 18:19:11,303][104459] Batcher 0 profile tree view:
batching: 8.6766, releasing_batches: 0.0027
[2022-07-28 18:19:11,303][104459] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
  wait_policy_total: 31.3772
update_model: 1.2661
  weight_update: 0.0008
one_step: 0.0065
  handle_policy_step: 62.8428
    deserialize: 7.0385, stack: 0.4377, obs_to_device_normalize: 16.4141, forward: 20.8333, to_cpu: 6.6601, save_outputs: 4.0125, send_messages: 3.9671
[2022-07-28 18:19:11,303][104459] Learner 0 profile tree view:
prepare_batch: 2.4887
train: 12.8213
  prepare_train: 0.0440, epoch_init: 0.0024, minibatch_init: 0.0035, forward_head: 0.3349, bptt_initial: 4.3416, tail: 0.2685, vtrace: 1.7941, losses: 0.8301, kl_divergence: 0.1001, after_optimizer: 0.4033
  bptt: 0.6011
    bptt_forward_core: 0.5777
  update: 3.5810
    clip: 0.5906
[2022-07-28 18:19:11,303][104459] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0420, prepare_next_step: 1.8627, env_step: 79.7330, overhead: 5.2951, complete_rollouts: 0.0860, enqueue_policy_requests: 0.6666
save_policy_outputs: 2.8068
  split_output_tensors: 1.2981
[2022-07-28 18:19:11,303][104459] RolloutWorker_w19 profile tree view:
wait_for_trajectories: 0.0361, prepare_next_step: 1.6234, enqueue_policy_requests: 0.5363, env_step: 66.1775, overhead: 4.7318, complete_rollouts: 0.0831
save_policy_outputs: 2.3827
  split_output_tensors: 1.1068
[2022-07-28 18:19:11,304][104459] Runner profile tree view:
main_loop: 104.3648
[2022-07-28 18:19:11,304][104459] Collected {0: 4005888}, FPS: 38305.0   <-- this is calculated differently now and includes startup time
Actual training FPS: (10 sec: 45875.1, 60 sec: 45602.4, 300 sec: 44422.8)

Version v2.0.0.v75
New model builder API, plus new GPU (3090Ti), new CUDA, new PyTorch
[2022-08-30 02:17:23,176][449786] Batcher 0 profile tree view:
batching: 9.0868, releasing_batches: 0.0143
[2022-08-30 02:17:23,177][449786] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
  wait_policy_total: 28.4021
update_model: 1.3279
  weight_update: 0.0009
one_step: 0.0037
  handle_policy_step: 67.6332
    deserialize: 6.9425, stack: 0.4945, obs_to_device_normalize: 15.1023, forward: 28.5246, send_messages: 4.0288
    prepare_outputs: 9.3175
      to_cpu: 4.3431
[2022-08-30 02:17:23,177][449786] Learner 0 profile tree view:
prepare_batch: 3.3698
train: 12.7842
  epoch_init: 0.0020, minibatch_init: 0.0035, losses_postprocess: 0.1301, kl_divergence: 0.1722, after_optimizer: 1.2661
  calculate_losses: 6.3435
    losses_init: 0.0024, forward_head: 0.5065, bptt_initial: 2.4949, tail: 0.2926, advantages_returns: 1.8817, losses: 0.4954
    bptt: 0.5209
      bptt_forward_core: 0.4975
  update: 4.5490
    clip: 0.6051
[2022-08-30 02:17:23,177][449786] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0563, enqueue_policy_requests: 2.6546, env_step: 80.1135, overhead: 5.5012, complete_rollouts: 0.1301
save_policy_outputs: 2.8837
  split_output_tensors: 1.3435
[2022-08-30 02:17:23,177][449786] RolloutWorker_w19 profile tree view:
wait_for_trajectories: 0.0347, enqueue_policy_requests: 2.2328, env_step: 66.3553, overhead: 4.5816, complete_rollouts: 0.0768
save_policy_outputs: 2.4376
  split_output_tensors: 1.1317
[2022-08-30 02:17:23,177][449786] Loop Runner_EvtLoop terminating...
[2022-08-30 02:17:23,178][449786] Runner profile tree view:
main_loop: 110.6443
[2022-08-30 02:17:23,178][449786] Collected {0: 4014080}, FPS: 36279.1


Version v2.0.0.v77 (switched to Gym 0.26.2, latest CUDA, PyTorch and everything else).
[2022-10-06 17:47:42,848][159866] Batcher 0 profile tree view:
batching: 8.5083, releasing_batches: 0.0166
[2022-10-06 17:47:42,849][159866] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
  wait_policy_total: 36.6387
update_model: 1.3975
  weight_update: 0.0010
one_step: 0.0051
  handle_policy_step: 65.1514
    deserialize: 6.8979, stack: 0.5148, obs_to_device_normalize: 16.1739, forward: 22.3737, send_messages: 3.9840
    prepare_outputs: 12.1662
      to_cpu: 7.2696
[2022-10-06 17:47:42,849][159866] Learner 0 profile tree view:
prepare_batch: 2.7143
train: 23.7993
  epoch_init: 0.0023, minibatch_init: 0.0037, losses_postprocess: 0.1572, kl_divergence: 0.1468, after_optimizer: 9.7619
  calculate_losses: 9.6756
    losses_init: 0.0019, forward_head: 0.3254, bptt_initial: 5.4393, tail: 0.2738, advantages_returns: 2.2930, losses: 0.5491
    bptt: 0.6322
      bptt_forward_core: 0.6085
  update: 3.7350
    clip: 0.6279
[2022-10-06 17:47:42,849][159866] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0419, enqueue_policy_requests: 2.6877, env_step: 86.6110, overhead: 5.5934, complete_rollouts: 0.1088
save_policy_outputs: 3.0250
  split_output_tensors: 1.4407
[2022-10-06 17:47:42,850][159866] RolloutWorker_w19 profile tree view:
wait_for_trajectories: 0.0351, enqueue_policy_requests: 2.2775, env_step: 70.5727, overhead: 4.8251, complete_rollouts: 0.0446
save_policy_outputs: 2.4676
  split_output_tensors: 1.1437
[2022-10-06 17:47:42,850][159866] Loop Runner_EvtLoop terminating...
[2022-10-06 17:47:42,851][159866] Runner profile tree view:
main_loop: 112.9504
[2022-10-06 17:47:42,852][159866] Collected {0: 4014080}, FPS: 35538.4
