- 3p_v6.1: mcts, fix localzero or lost can only do DONOTHING_ZERO at 1st round, id encoded 
- 3p_v6.5: monte-carlo simulation method, fix 1st round can only do nothing (2 do nothing states)
- 3p_v6.6: mcts method, fix 1st round can only do nothing (2 do nothing states)
- 3p_v6.7: mcts, fix 1st round can only do nothing (2 donothing states), and if only localzero + lost: DONOTHING_ZERO  
- 3p_v6.8: mcts, fix loaclzero or lost can only commit ZERO at 1st round
- 3p_v7.0: mcts, stick with the same crash setting for one run of mcts (no history encoded)
- 3p_v7.1: same as 7.0 but history encoded

- 3p_v8.0: same setting as 6.6, add new implementation
- 3p_v8.1: same setting as 6.6, tune reward function like more localone and commit one, gets higher positive reward
- 3p_v9.4: mcts and fix correct answer in second round. Fix 1st round can only do nothing, use policy network to verify, store positive rewards training samples only (pass all)
- 3p_v9.5: mcts and fix correct answer in both rounds. Fix 1st round can only do nothing, use policy network to verify 

- 3p_v12.0: mcts, load 3p_v11.0, fix close actions, policy network (pass all)
- 3p_v12.1: mcts, load 3p_v11.0, fix close actions, value network (pass all)
- 3p_v12.2: mcts, load 3p_v11.0, only fix the 1st detected (state, action) pari, policy network, non-stop even pass all

- 3p_v13.0(1): mcts, fix 1st round actions, policy network
- 3p_v13.2/3: continuing with fixing logic.
- 3p_v13.4: mcts, start from scratch with fixing logic
- 3p_v13.5: mcts, un/fix one thing at a time
- 3p_v13.6: mcts, only one state is fixed, almost is a random training
- 3p_v13.7: mcts, load 13.6 model, fix and unfix one thing at a time, unfix latest, has the issue written in g_doc 
- 3p_v13.10: mcts, load 13.6, fix and unfix one thing at a time and unfix the latest if cannot pass in 5 iteration if no new fixing and unfixing
- 3p_v13.11: mcts, load 13.6, fix all and unfix one by swapping

- 3p_v15.0: mcts, load 13.6, with new implementation of StateActionTracker.py
- 3p_v15.1: load 3p, only fix 1 state to test how it can propogate to other states
- 3p_v15.2: load 3p, fix 1 state every 20 iterations. Good

- 3p: mcts, from scratch, pre-train without any fixing, 1st round can only do nothing
- 3p_1: mcts, from scratch, pre-train without any fixing, 1st round can also commit 

- dl_5p_v0.0: standard mcts, value network
- dl_5p_v0.2: standard mcts, tune reward, value network
- dl_5p_v1.0: mcts and fix correct answer, policy network
- dl_5p_v2.0: mcts and fix correct answer, lower learning rate to 0.0001, use huber loss, value network (pass all)

- ac_3p_v0.0: no history, mcts, atomic commit, donothing and commit at the 1st round and then crash has the same preference
- ac_3p_v0.1: same as above but fix all localcommits to commit
- ac_3p_v0.2: same as above, fix all localcommits to commit and prefer commit/abort at 1st round and crash by tuning reward functions 

- pb_3p_init: init model, no history
- pb_3p_v0.0: no history, same preference, auto fixing
- pb_3p_v0.1: reward preference, auto fixing
- pb_3p_v0.2: history, tune reward, auto fixing
- pb_3p_v0.6: history, no deepcopy, auto fixing
- pb_3p_v0.7: same as above
- pb_3p_v0.8: from scratch, same as above, good
- pb_3p_v0.9: from scratch, no history, auto fixing

no hisotry:
- pb_3p_pretrain: pretrain withno fixing logic
- pb_3p_pretrain_1: same as above

- pb_3p_v2.0: from pb_3p_pretrain
- pb_3p_v3.0: pretrain 80 iters; dfs interval: 20 iters; only do nothing at 1st round; good model; unfixing triggered
- pb_3p_v3.1: Same as above, but no fixing triggered, get good model
- pb_3p_v3.3: same as above; good; 1 unfixing triggered (loose unfixing condition)

- pb_3p_v5.0: no node id, with minimax-style design, good

- pb_4p_v0.0: encode round number rather than node id. (pb_4p_v0.0_433_best)
- pb_4p_v1.0: no node id, with minimax-style design, not good.

- pb_4p_tran_v0.*/v1.*: transformer + stage sampling
- pb_4p_tran_v2.*: Same, but remove input dimension limitation