Deepstack
Posted : admin On 7/29/2022The easiest way to integrate Deepstack and Home Assistant is just to run Deepstack on the same computer as Home Assistant. If you run Deepstack on the same machine as Home Assistant, you can simply use the internal local IP address 127.0.0.1 with the Facebox port 5000 when referencing Deepstack in Home Assistant. DeepStack is an AI server you can easily install, use completely offline or on the cloud for Face Recognition, Object Detection, Scene Recognition and Custom Recognition APIs to build business and industrial applications! Explore life-changing ideas and skills. Get smarter and upgrade yourself, reading just 5 minutes every day. Join 1M+ others and start improving. DeepStack Showdown Poker Series A shorter series designed for locals and those a short drive away, DeepStack Showdown Poker Series events typically run from one to two weeks at a time. They offer multiple buy-ins each day of varying amounts to appeal to a range of players.
We welcome suggestions and improvements.
Thank you to Michael Bowling, Michael Johanson, and Marc Lanctot for contributions to this guide.
Additionally, this would not have been possible without the generous support ofProf. Joan Bruna and his class at NYU, The Mathematics of Deep Learning.Special thanks to him, as well as Martin Arjovsky, my colleague in leading thisrecitation, and my fellow students Ojas Deshpande, Anant Gupta, Xintian Han,Sanyam Kapoor, Chen Li, Yixiang Luo, Chirag Maheshwari, Zsolt Pajor-Gyulai,Roberta Raileanu, Ryan Saxe, and Liang Zhuo.
Along with Libratus, DeepStack is one of two approaches to solving No-Limit Texas Hold-em that debuted coincidentally. This game was notoriously difficultto solve as it has just as large a branching factoras Go, but additionally is a game of imperfect information.
The main idea behind both DeepStack and Libratus is to use Counterfactual Regret Minimization (CFR) to find a mixed strategy that approximates a Nash Equilibrium strategy. CFR’s convergence properties guarantee that we will yield such a strategyand the closer we are to it, the better our outcome will be. They differ intheir implementation. In particular, DeepStack uses deep neural networksto approximate the counterfactual value of each hand at specific points in thegame. While still being mathematically tight, this lets it cut short the necessary computation to reach convergence.
In this curriculum, you will explore the study of games with a tour through game theory and counterfactual regret minimization while building up the requisite understanding to tackle DeepStack. Along the way, you will learnall of the necessary topics, including what is the branching factor, all aboutNash Equilibria, and CFR.
- MAS: Multi Agent Systems.
- LT: Marc Lanctot’s Thesis.
- ICRM: Introduction to Counterfactual Regret Minimization.
- PLG: Prediction, Learning, and Games.
Motivation: Most of Game Theory, as well as the particular techniques used in DeepStack and Libratus, is built on the framework of Normal Form Games. These are game descriptions and are familiarly represented as a matrix, a famous example being the Prisoner’s Dilemma. In this section, we cover the basics of Normal Form Games. In addition, we go over the rules of Poker and why it had proved so difficult to solve.
Required Reading:
- MAS: Sections 3.1 & 3.2.
- LT: Pages 5-7.
- The Game of Poker: Supplementary #1 on pages 16-17.
Optional Reading:
- The State of Solving Large Incomplete-Information Games, and Application to Poker (2010)
- Why Poker is Difficult Very good video by Noam Brown, the main author of Libratus. The first eighteen minutes are the most relevant.
Questions:
- LT: Prove that in a zero-sum game, the Nash Equilibrium strategies are interchangeable.
Hint
Use the definition of a Nash Equilibrium along with the fact that (mu_{i}(sigma_{i}, sigma_{-i}) + mu_{-i}(sigma_{i}, sigma_{-i}) = c).
- LT: Prove that in a zero-sum game, the expected payoff to each player is the same for every equilibrium.
Solution
We will solve both this problem and the one above here. We have that if(mu_{i}(sigma) = mu(sigma_{i}, sigma_{-i})) and(mu_{i}(sigma') = mu(sigma_{i}', sigma_{-i}')) are both Nash Equilibria, then:
(begin{align}mu_{i}(sigma_{i}, sigma_{-i}) &geq mu_{i}(sigma_{i}', sigma_{-i}) &= c - mu_{-i}(sigma_{i}', sigma_{-i}) &geq c - mu_{-i}(sigma_{i}', sigma_{-i}') &= mu_{i}(sigma_{i}', sigma_{-i}')end{align})
In a similar fashion, we can show that (mu(sigma_{i}', sigma_{-i}') geq mu(sigma_{i}, sigma_{-i})).
Consequently, (mu(sigma_{i}', sigma_{-i}') = mu(sigma_{i}, sigma_{-i})),which also implies that the strategies are interchangeable, i.e.(mu(sigma_{i}', sigma_{-i}') = mu(sigma_{i}', sigma_{-i})). - MAS: Prove Lemma 3.1.6.
(textit{Lemma}): If a preference relation (succeq) satisfies the axiomscompleteness, transitivity, decomposability, and monotonicity, and if (o_1 succ o_2)and (o_2 succ o_1), then there exists probability (p) s.t. (forall p' < p),(o_2 succ [p': o_1; (1 - p'): o_3]) and for all (p' > p),([p': o_1; (1 - p'): o_3] succ o_2.) - MAS: Theorem 3.1.8 ensures that rational agents need only maximize the expectation of single-dimensional utility functions. Prove this result as a good test of your understanding.
(textit{Theorem}): If a preference relation (succeq) satisfies the axioms completeness, transitivity, substitutability, decomposability, monotonicity, and continuity, then there exists a function (u: mathbb{L} mapsto [0, 1]) with the properties that:- (u(o_1) geq u(o_2)) iff (o_1 succeq o_2).
- (u([p_1 : o_1, ..., p_k: o_k]) = sum_{i=1}^k p_{i}u(o_i)).
Motivation: How do you reason about games? The best strategies in multi-agent scenarios depend on the choices of others. Game theory deals with this problem by identifying subsets of outcomes called solution concepts. In this section, we discuss the fundamental solution concepts: Nash Equilibrium, Pareto Optimality, and Correlated Equilibrium. For each solution concept, we cover what it implies for a given game and how difficult it is to discover a representative strategy.
Required Reading:
- MAS: Sections 3.3, 3.4.5, 3.4.7, 4.1, 4.2.4, 4.3, & 4.6.
- LT: Section 2.1.1.
Optional Reading:
- MAS: Section 3.4.
Questions:
- Why must every game have a Pareto optimal strategy?
Solution
Say that a game does not have a Pareto optimal outcome. Then, for every outcome (O), there was another (O') that Pareto-dominated (O).Say (O_2 > O_1). Because (O_2) is not Pareto optimal, there is some (O_k > O_2). There cannot be a max in this chain (because that max wouldbe Pareto optimal) and thus there must be some cycle. Consequently, thereexists for some agent a strategy (O_j) s.t. (O_j > O_j), which is a contradiction.
- Why must there always exist at least one Pareto optimal strategy in which all players adopt pure strategies?
- Why in common-payoff games do all Pareto optimal strategies have the same payoff?
Solution
Say two strategies (S) and (S') are Pareto optimal. Then neitherdominates the other, so either (forall i mu_{i}(S) = mu_{i}(S'))or there are two players (i, j) for which (mu_{i}(S) < mu_{i}(S'))and (mu_{j}(S) > mu_{j}(S')). In the former case, we see that thetwo strategies have the same payoff as desired. In the latter case, we havea contradiction because (mu_{j}(S') = mu_{i}(S') > mu_{i}(S) = mu_{j}(S) > mu_{j}(S')). Thus, all of the Pareto optimal strategies must have the same payoff.
- MAS: Why does definition 3.3.12 imply that the vertices of a simplex must all receive different labels?
Solution
This follows from the definitions of (mathbb{L}(v)) and (chi(v)).At the vertices of the simplex, (chi) will only have singular values inits range defined by the vertice itself. Consequently, (mathbb{L}) mustas well.
- MAS: Why in definition 3.4.12 does it not matter that the mapping is to pure strategies rather than to mixed strategies?
- Take your favorite normal-form game, find a Nash Equilibrium, and then find a corresponding Correlated Equilibrium.
Motivation: What happens when players don’t act simultaneously? Extensive Form Games are an answer to this question. While this representation of a game always has a comparable Normal Form, it’s much more natural to reason about sequential games in this format. Examples include familiar ones like Go, but also more exotic games like Magic: The Gathering and Civilization. This section is imperative as Poker is best described as an Extensive Form Game.
Required Reading:
- MAS: Sections 5.1.1 - 5.1.3.
- MAS: Sections 5.2.1 - 5.2.3.
- Accelerating Best Response Calculation in Large Extensive Games: This is important for understanding how to evaluate Poker algorithms.
Deepstack Ai
Optional Reading:
- LT: Section 2.1.2.
Questions:
- What is the intuition for why not all normal form games can be transformed into perfect-form extensive games?
Solution
The problem is one of modeling simultaneity. Perfect information extensive form games have trouble modeling concurrent moves because theyhave an explicit temporal structure of moves.
- Why does that change when the transformation is to imperfect extensive games?
- How are the set of behavioral strategies different from the set of mixed strategies?
Solution
The set of mixed strategies are each distributions over pure strategies. The set of behavioral strategies are each vectors of distributions over theactions and assign that distribution independently at each Information Set.
- Succinctly describe the technique demonstrated in the Accelerating Best Response paper.
Motivation: Counterfactual Regret Minimization (CFR) is only a decade old but has already achieved huge success as the foundation underlying DeepStack and Libratus. In the first of two weeks dedicated to CFR, we learn how the algorithm works practically and get our hands dirty coding up our implementation.
The optional readings are papers introducing CFR-D and CFR+, further iterations upon CFR. These are both used in DeepStack.
Required Reading:
- ICRM: Sections 2.1-2.4.
- ICRM: Sections 3.1-3.4.
- LT: Section 2.2.
- Regret Minimization in Games with Incomplete Information.
Optional Reading: These two papers are CFR extensions used in DeepStack.
- Solving Imperfect Information Games Using Decomposition: CFR-D.
- Solving Large Imperfect Information Games Using CFR+: CFR+.
Questions:
- What is the difference between external regret, internal regret, swap regret, and counterfactual regret?
Hint
The definitions of the three are the following:
- External Regret: How much the algorithm regrets not taking the bestsingle decision in hindsight. We compare to a policy that performs a singleaction in all timesteps.
- Internal Regret: How much the algorithm regrets making one choiceover another in all instances. An example is whenever you bought Amazon stock,you instead bought Microsoft stock.
- Swap Regret: Similar to Internal Regret but instead of one categoricalaction being replaced wholesale with another categorical action, now we allowfor any number of categorical swaps.
- Counterfactual Regret: Assuming that your actions take you to a node, this is the expectation of that node over your opponents' strategies.The counterfactual component is that we assume you get to that node with aprobability of one.
- Why is Swap Regret important?
Hint
Swap Regret is connected to Correlated Equilibrium. Can you see why?
- Implement CFR (or CFR+ / CFR-D) in your favorite programming language to play Leduc Poker or Liar’s Dice.
- How do you know if you’ve implemented CFR correctly?
Solution
One way is to test it by implementing Local Best Response. It should perform admirably against that algorithm, which is meant to best it.
Deepstack Poker
Motivation: In the last section, we saw the practical side of CFR and how effective it can be. In this section, we’ll understand the theory underlying it. This will culminate with Blackwell’s Approachability Theorem, a generalization of repeated two-player zero-sum games. This is a challenging session but the payoff will be a much keener understanding of CFR’s strengths.
Required:
- PLG: Sections 7.3 - 7.7, 7.9.
Optional:
- A Simple Adaptive Procedure Leading to Correlated Equilibrium.
- Prof. Johari’s 2007 Class - 11.
- Prof. Johari’s 2007 Class - 13.
- Prof. Johari’s 2007 Class - 14.
- Prof. Johari’s 2007 Class - 15.
Questions:
PLG: Prove Lemma 7.1.
[sum_{i: i_{k} = j} P(i)big(mathcal{l}(i) - mathcal{l}(i^{-}, j')big) leq 0]
(textit{Lemma}): A probability distribution (P) over the set of all (K)-tuples(i = (i_{1}, ..., i_{K})) of actions is a correlated equilibrium iff, for everyplayer (k in {1, ..., K}) and actions (j, j' in {1, ..., N_{k}}), we havewhere ((i^{-}, j') = (i_{1}, ..., i_{k-1}, j', i_{k+1}, ..., i_{K})).
It’s brushed over in the proof of Theorem 7.5 in PLG, but prove that if set (S) is approachable, then every halfspace (H) containing (S) is approachable.
Solution
Because (S in H) is approachable, we can always find a strategy for player one s.t.the necessary approachability clauses hold (see Johari's Lecture 13). Namely, choosethe strategy in (S) that asserts (S) as being approachable.
Motivation: Let’s read the paper! A summary of what’s going on to help with your understanding:
DeepStack runs counterfactual regret minimization at every decision. However, it uses two separate neural networks, one for after the flop and one for after the turn, to estimate the counterfactual values without having to continue running CFR after those moments. This approach is trained beforehand and helps greatly with cutting short the search space at inference time. Each of the networks take as input the size of the pot and the current Bayesian ranges for each player across all hands. They output the counterfactual values for each hand for each player.
In addition to DeepStack, we also include Libratus as required reading. This paper highlights Game Theory and CFR as the really important concepts in this curriculum; deep learning is not necessary to build a champion Poker bot.
Required Reading:
- DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker.
- DeepStack Supplementary Materials.
- Libratus.
- Michael Bowling on DeepStack.
Optional Reading:
- DeepStack Implementation for Leduc Hold’em.
- Noam Brown on Libratus.
- Depth-Limited Solving for Imperfect-Information Games: This paper is fascinating because it is achieves a poker-playing bot almost as good as Libratus but using a fraction of the necessary computation and disk space.
Questions:
Deep Stack Extravaganza
- What are the differences between the approaches taken in DeepStack and in Libratus?
Solution
Here are some differences:
- A clear difference is that DeepStack uses a deep neural network to reduce the necessary search space, and Libratus does not.
- DeepStack does not use any action abstraction and instead melds those considerations into the pot size input. Libratus does use a dense action abstraction but adapts it each game and additionally constructs new sub-games on the fly for actions not in its abstraction.
- DeepStack uses card abstraction by first clustering the hands into 1000 buckets and then considering probabilities over that range. Libratus does not use any card abstraction preflop or on the flop, but does use it on later rounds such that the game's (10^{61}) decision points are reduced to (10^{12}).
- DeepStack does not have a way to learn from recent games without further neural network training. On the other hand, Libratus improves via a background process that adds novel opponent actions to its action abstraction.
- Can you succinctly explain “Continual Re-solving”?
- Can you succinctly explain AIVAT?