# Available algorithms

![](_static/green_circ10.png "green circle"): thoroughly-tested. In many cases,
we verified against known values and/or reproduced results from papers.

<font color="orange"><b>~</b></font>: implemented but lightly tested.

<font color="red"><b>X</b></font>: known problems; please see github issues.

Algorithms                                        | Category     | Reference                                                                                                                                                                                                                                                                                                       | Status
------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------
Information Set Monte Carlo Tree Search (IS-MCTS) | Search       | [Cowley et al. '12](https://ieeexplore.ieee.org/abstract/document/6203567)                                                                                                                                                                                                                                      | <font color="orange"><b>~</b></font>
Minimax (and Alpha-Beta) Search                   | Search       | [Wikipedia1](https://en.wikipedia.org/wiki/Minimax#Minimax_algorithm_with_alternate_moves), [Wikipedia2](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning), Knuth and Moore '75                                                                                                                         | ![](_static/green_circ10.png "green circle")
Monte Carlo Tree Search                           | Search       | [Wikipedia](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search), [UCT paper](http://ggp.stanford.edu/readings/uct.pdf), [Coulom '06](https://hal.inria.fr/inria-00116992/document), [Cowling et al. survey](http://www.incompleteideas.net/609%20dropbox/other%20readings%20and%20resources/MCTS-survey.pdf) | ![](_static/green_circ10.png "green circle")
Lemke-Howson (via <tt>nashpy</tt>)                | Opt.         | [Wikipedia](https://en.wikipedia.org/wiki/Lemke%E2%80%93Howson_algorithm), [Shoham &amp; Leyton-Brown '09](http://masfoundations.org/)                                                                                                                                                                          | ![](_static/green_circ10.png "green circle")
Sequence-form linear programming                  | Opt.         | [Koller, Megiddo, and von Stengel '94](http://theory.stanford.edu/~megiddo/pdf/stoc94.pdf), <br> [Shoham &amp; Leyton-Brown '09](http://masfoundations.org/)                                                                                                                                                    | ![](_static/green_circ10.png "green circle")
Counterfactual Regret Minimization (CFR)          | Tabular      | [Zinkevich et al '08](https://poker.cs.ualberta.ca/publications/NIPS07-cfr.pdf), [Neller &amp; Lanctot '13](http://modelai.gettysburg.edu/2013/cfr/cfr.pdf)                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
CFR against a best responder (CFR-BR)             | Tabular      | [Johanson et al '12](https://poker.cs.ualberta.ca/publications/AAAI12-cfrbr.pdf)                                                                                                                                                                                                                                | ![](_static/green_circ10.png "green circle")
Exploitability / Best response                    | Tabular      | [Shoham &amp; Leyton-Brown '09](http://masfoundations.org/)                                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
External sampling Monte Carlo CFR                 | Tabular      | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf)                                                                                                                                                            | ![](_static/green_circ10.png "green circle")
Outcome sampling Monte Carlo CFR                  | Tabular      | [Lanctot et al. '09](http://mlanctot.info/files/papers/nips09mccfr.pdf), [Lanctot '13](http://mlanctot.info/files/papers/PhD_Thesis_MarcLanctot.pdf)                                                                                                                                                            | ![](_static/green_circ10.png "green circle")
Q-learning                                        | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
SARSA                                             | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
Policy Iteration                                  | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
Value Iteration                                   | Tabular      | [Sutton &amp; Barto '18](http://incompleteideas.net/book/the-book-2nd.html)                                                                                                                                                                                                                                     | ![](_static/green_circ10.png "green circle")
Advantage Actor-Critic (A2C)                      | RL           | [Mnih et al. '16](https://arxiv.org/abs/1602.01783)                                                                                                                                                                                                                                                             | ![](_static/green_circ10.png "green circle")
Deep Q-networks (DQN)                             | RL           | [Mnih et al. '15](https://www.nature.com/articles/nature14236)                                                                                                                                                                                                                                                  | ![](_static/green_circ10.png "green circle")
Ephemeral Value Adjustments (EVA)                 | RL           | [Hansen et al. '18](https://arxiv.org/abs/1810.08163)                                                                                                                                                                                                                                                           | <font color="orange"><b>~</b></font>
Deep CFR                                          | MARL         | [Brown et al. '18](https://arxiv.org/abs/1811.00164)                                                                                                                                                                                                                                                            | <font color="orange"><b>~</b></font>
Exploitability Descent (ED)                       | MARL         | [Lockhart et al. '19](https://arxiv.org/abs/1903.05614)                                                                                                                                                                                                                                                         | ![](_static/green_circ10.png "green circle")
(Extensive-form) Fictitious Play (XFP)            | MARL         | [Heinrich, Lanctot, &amp; Silver '15](http://proceedings.mlr.press/v37/heinrich15.pdf)                                                                                                                                                                                                                          | ![](_static/green_circ10.png "green circle")
Neural Fictitious Self-Play (NFSP)                | MARL         | [Heinrich &amp; Silver '16](https://arxiv.org/abs/1603.01121)                                                                                                                                                                                                                                                   | ![](_static/green_circ10.png "green circle")
Neural Replicator Dynamics (NeuRD)                | MARL         | [Omidshafiei, Hennes, Morrill, et al. '19](https://arxiv.org/abs/1906.00190)                                                                                                                                                                                                                                    | <font color="red"><b>X</b></font>
Regret Policy Gradients (RPG, RMPG)               | MARL         | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026)                                                                                                                                                                                                                                             | ![](_static/green_circ10.png "green circle")
Policy-Space Response Oracles (PSRO)              | MARL         | [Lanctot et al. '17](https://arxiv.org/abs/1711.00832)                                                                                                                                                                                                                                                          | ![](_static/green_circ10.png "green circle")
Q-based ("all-actions") Policy Gradient (QPG)     | MARL         | [Srinivasan, Lanctot, et al. '18](https://arxiv.org/abs/1810.09026)                                                                                                                                                                                                                                             | ![](_static/green_circ10.png "green circle")
Regression CFR (RCFR)                             | MARL         | [Waugh et al. '15](https://arxiv.org/abs/1411.7974), [Morrill '16](https://poker.cs.ualberta.ca/publications/Morrill_Dustin_R_201603_MSc.pdf)                                                                                                                                                                   | ![](_static/green_circ10.png "green circle")
Rectified Nash Response (PSRO_rn)                 | MARL         | [Balduzzi et al. '19](https://arxiv.org/abs/1901.08106)                                                                                                                                                                                                                                                         | <font color="orange"><b>~</b></font>
&alpha;-Rank                                      | Eval. / Viz. | [Omidhsafiei et al. '19](https://www.nature.com/articles/s41598-019-45619-9), [arXiv](https://arxiv.org/abs/1903.01373)                                                                                                                                                                                         | ![](_static/green_circ10.png "green circle")
Replicator / Evolutionary Dynamics                | Eval. / Viz. | [Hofbaeur &amp; Sigmund '98](https://www.cambridge.org/core/books/evolutionary-games-and-population-dynamics/A8D94EBE6A16837E7CB3CED24E1948F8), [Sandholm '10](https://mitpress.mit.edu/books/population-games-and-evolutionary-dynamics)                                                                       | ![](_static/green_circ10.png "green circle")
