same cell; for details of calculations leading to payoffs, see appendix). This calculation is the result of the following logic: each time the temptation is low, player 1 cooperates and gets a, player 2 gets b, and the game continues with probability w until the first time the temptation is high. We refer to the strategy where player 1 cooperates without looking (top row) as CWOL. We also refer to the strategy pair where player 1 CWOLs and player 2 continues if player 1 CWOLs (first column) as CWOL. We refer to the strategy pairs where player 1 cooperates with or without looking and player 2 continues if player 1 cooperates (first and second row, and middle middle column) as CWL. We refer to the strategy pair where player 1 always defects and player 2 always exits (bottom row and rightmost column) as ALLD. ALLD is always an equilibrium of the envelope game. CWOL is an equilibrium if a/(1 — w) > cp+e¢,(1 — p). CWL is an equilibrium if a/(1—w) > cp». This region is a subset of the region for which CWOL is an equilibrium. Figure 3: Learning Dynamics of the Envelope Game We apply the replicator dynamic to the envelope game restricted to the strategies repre- sented in figure 2. The replicator dynamic describes strategies evolving over time under the assumption that the rate of reproduction within each population is proportional to the fitness relative to that type’s other strategies. The replicator dynamic also models learning dynamics such as reinforcement learning or prestige-biased imitation. We run 1000 time series with randomly seeded strategy frequencies for a range of values of a, and record the frequency with which they stabilize in one of the strategy pairs identified in figure 2, or in a behaviorally equivalent equilibrium, as presented in the simplexes. We vary the value of a along the x-axis. The y-axis represents frequencies, and each colored line presents the frequency of the strategy pair. The parameter region where the strategy pair is supported as an e