Content area
Abstract
In repeated principal‐agent problems and games, more outcomes are implementable when performance signals are privately observed by a principal or mediator with commitment power than when the same signals are publicly observed and form the basis of a recursive equilibrium. We investigate the gains from nonrecursive equilibria (e.g., “review strategies”) based on privately observed signals. Under a pairwise identification condition, we find that the gains from nonrecursive equilibria are “small”: their inefficiency is of the same 1 − δ power order as that of recursive equilibria. Thus, while private strategies or monitoring can outperform public ones for a fixed discount factor, they cannot accelerate the power rate of convergence to the efficient payoff frontier when the folk theorem holds. An implication is that the gains from withholding performance feedback from agents are small when the parties are patient.
Full text
Introduction
Most analysis of repeated moral hazard problems and games focuses on contracts and equilibria that are recursive in the players' continuation values. This approach is without loss in single-agent problems with public performance signals (Spear and Srivastava (1987)). It is also without loss in repeated games with imperfect public monitoring, if attention is restricted to equilibria in pure strategies or in strategies that depend only on the public signals (Abreu, Pearce, and Stacchetti (1990), Fudenberg, Levine, and Maskin (1994)). In contrast, in single-agent problems where the principal privately observes performance, or in repeated games where signals are privately observed by a mediator, more payoffs are implementable as compared to the case where the same signals are publicly observed, as concealing signals reduces the players' available deviations. Similarly, focusing on pure or public equilibria in repeated games with public monitoring is not without loss (Kandori and Obara (2006)). Yet, characterizing the equilibrium payoff set with private signals or strategies is intractable, precisely because this set lacks a tractable recursive structure (Kandori (2002)). The extent of the possible gains from nonrecursive equilibria based on private signals or strategies over recursive equilibria based on public signals is thus an open question, which forms the subject of the current paper.
We consider discounted repeated games where in each period players take actions a and a signal y is drawn from a distribution with full support. We compare the equilibrium payoff sets in a version of the game with public monitoring, where the signal y is publicly observed and attention is restricted to equilibria in public strategies, and a version with private monitoring, where the signal y is observed only by a principal (or mediator) with commitment power, who privately recommends actions to the players. We call these two versions of the game the public game and the blind game. By the revelation principle, for any discount factor δ, the equilibrium payoff set is weakly larger in the blind game than the public game. Our question is, how much larger?
For any fixed discount factor , this question is difficult to answer in any generality, because characterizing equilibrium payoffs in the blind game is intractable. We instead adopt a rate of convergence approach: under standard identification conditions that ensure that efficiency is attainable in the limit, how quickly does inefficiency vanish as in the most efficient equilibrium in the public game as compared to the blind game?
Our main result is that inefficiency is of the same power order of in both games. Thus, while private strategies or monitoring can outperform public ones for a fixed discount factor, they cannot accelerate the rate of convergence to the efficient payoff frontier when the folk theorem holds, except possibly for log terms in . In this sense, the gains from nonrecursive equilibria are small when players are patient.
Our results have implications for the design of principal-agent relationships. An important design variable in such relationships is the amount of performance feedback provided to the agent. While providing feedback can have practical benefits that are not captured by our model, a benefit of withholding feedback is that this facilitates nonrecursive contracting by making the game blind rather than public. However, our results show that this benefit of withholding feedback is small when the parties are patient.
The high-level intuition for our results is that, as compared to a recursive contract where the agents' continuation values are revealed in every period, pooling information across periods improves monitoring precision but also necessitates larger rewards and punishments, which reduces the scope for providing incentives by transferring future surplus between the agents than destroying it through mutual punishments. Our analysis shows that these two effects essentially cancel out, so that little is gained by pooling information.
A subtlety in our results is that, while inefficiency is always of the same power order in the public game and the blind game, this order depends on the curvature of the boundary of the feasible payoff set. If the boundary is smooth with positive curvature (as in Green and Porter (1984), Spear and Srivastava (1987), Sannikov (2007, 2008), or Sadzik and Stacchetti (2015)), inefficiency is of order .1 In this case, the first-order inefficiency associated with small continuation payoff movements along the payoff boundary is zero. We show that this implies that inefficiency in the public and blind games differs only by a constant factor: i.e., the rates of convergence are identical. Moreover, for a class of smooth principal-agent models (similar to Spear and Srivastava (1987), or Sannikov (2008)), inefficiency in the public and blind games is identical up to a first-order approximation.
In contrast, if the boundary of the feasible payoff set is kinked (e.g., if there are finitely many actions), inefficiency is of power order .2 In this case, the first-order inefficiency associated with small continuation payoff movements is positive. We show that this greater inefficiency leads to a greater value of withholding feedback: now, inefficiency in the public and blind games can differ by a log factor in . Thus, while the value of withholding feedback is always “small” (no improvement in the power rate of convergence), it is somewhat less small in the kinked case (where there can be a log-factor improvement) than in the smooth case (where there is at most a constant-factor improvement, with no first-order improvement whatsoever in standard principal-agent models).
Methodologically, we develop a new technique for bounding equilibrium payoffs in repeated games with private monitoring. The starting point is that continuation payoff rewards or punishments incur an efficiency loss related to the curvature of the boundary of the feasible payoff set, while providing incentives that are proportional to a likelihood ratio difference . Since the likelihood ratio difference is a martingale increment (as the expected likelihood ratio difference under equals 0), large deviations theory can be used to bound the cumulative likelihood ratio difference over any number of periods. This bound connects the inefficiency and “incentive strength” of any strategy profile, so that any equilibrium where players do not take myopic best responses must incur a certain amount of inefficiency, regardless of whether signals are public or private.
Relation to the literature
Our finding that the gains from nonrecursive equilibria are small contrasts with two strands of prior literature that find large gains. These strands share the feature that continuation value transfers are impossible with public strategies. This feature reduces the efficiency of public strategies and thereby generates large gains from nonrecursive private strategies.
First, Holmström and Milgrom (1987) study a dynamic principal-agent model where the agent exerts effort over T periods and consumption occurs at the end of the game. The value of withholding feedback is large: without feedback, first-best profit can be approximated as using a “review strategy” that aggregates signals over T periods and harshly penalizes the agent if a count of signal realizations falls below a threshold;3 with feedback, optimal contracts are linear in the count of signal realizations, and profits are bounded away from the first best for all T. The key difference from our setup is that Holmström and Milgrom's model is not a repeated game (as consumption only occurs once), so efficiency cannot be improved by transferring continuation payoffs over time.4
Second, several papers study principal-agent problems or games that, while repeated, do not permit continuation value transfers. Abreu, Milgrom, and Pearce (1991) and Sannikov and Skrzypacz (2007) consider settings without pairwise identifiability, while Matsushima (2004) and Fuchs (2007) restrict attention to block belief-free equilibria. These settings preclude continuation value transfers, and consequently these papers all find that efficiency is attained as only when feedback is withheld.5
In past work (Sugaya and Wolitzky, 2017, 2018), we showed that the value of withholding feedback (“maintaining privacy”) is large in some specific repeated and dynamic games when δ is bounded away from 1. For example, our 2018 paper examined how maintaining privacy can help sustain multimarket collusion. In contrast, the current paper shows that the value of privacy in repeated games is small when δ is close to 1.
We also relate to the broader literature on feedback in dynamic agency and games. We consider complete information repeated games without payoff-relevant state variables, so feedback concerns only past performance, which is payoff-irrelevant in the continuation game. In contrast, most of the literature on feedback in dynamic agency considers dynamic games with additional state variables, such as an agent's ability (Ederer (2010), Smolin (2021)), other agents' progress in a tournament (Gershkov and Perry (2009), Aoyagi (2010), Ely, Georgiadis, and Rayo (2025)), whether a project has been completed (Halac, Kartik, and Liu (2017), Ely, Georgiadis, Khorasani, and Rayo (2023)), or the evolution of an exogenous state (Ely and Szydlowski (2020), Orlov, Skrzypacz, and Zryumov (2020), Ball (2023)). An exception is Lizzeri, Meyer, and Persico (2002), who examine optimal two-period agency contracts with and without a “midterm review.”
We also contribute to the literature on review strategies, introduced by Rubinstein (1979), Rubinstein and Yaari (1983), and Radner (1985), and developed by Abreu, Milgrom, and Pearce (1991) and Matsushima (2001, 2004). These papers show that review strategies can support efficient outcomes when (or when there is no discounting at all). In contrast, we show that review strategies cannot greatly outperform recursive contracts when δ is close to 1.
Methodologically, the closest papers are Hörner and Takahashi (2016), who build on Fudenberg, Levine, and Maskin's recursive methods to show that inefficiency is of order in repeated finite-action games with public monitoring; and Sugaya and Wolitzky (2023), who bound the strength of players' equilibrium incentives in repeated finite-action games with private monitoring.6 Rather than bounding incentives, the current paper derives a tradeoff between efficiency and incentives (expressed though Lagrange multipliers, e.g., program (4) below) and uses it to characterize the rate of convergence. In addition, the arguments in our 2023 paper are based on variance decomposition, while the current paper requires more precise estimates from martingale large deviations theory.
Finally, our exact characterization of first-order inefficiency in repeated principal-agent models relates to Sannikov (2008) and Sadzik and Stacchetti (2015), who derive similar results under public monitoring in continuous time or “frequent action” models. Here, our main contribution is showing that withholding feedback leaves first-order inefficiency unchanged.
Outline
The paper is organized as follows. Section 2 describes the model. Section 3 gives an informal overview of our results. Section 4 establishes general upper bounds on equilibrium efficiency. Section 5 establishes that these bounds are attainable in public equilibria (excepting a log factor in the finite-action case). Combining these results implies that the gains from nonrecursive equilibria are small. Section 6 gives a stronger result for principal-agent problems. Section 7 discusses extensions.
Preliminaries
This section introduces our model of repeated games with public monitoring and blind repeated games.
A stage game consists of a finite set of players , a product set of actions , and a payoff function for each . We assume that each is a nonempty, compact metric space and each is continuous.7 By the Debreu–Fan–Glicksberg theorem, the stage game admits a Nash equilibrium in mixed actions.
We fix some basic notation: the sets of stage-game Nash and correlated equilibria are and ; the feasible payoff set is with boundary bnd(F); the sets of stage-game Nash and correlated equilibrium payoffs are and ; the Euclidean metric and norm on are and ; and the set of unit vectors (or directions) in is .
Recall that a payoff vector is an exposed point of F if there exists such that v uniquely maximizes over : that is, if the set of exposing directions is nonempty. If is finite, the sets of exposed and extreme points of F coincide; if is infinite, the exposed points are a dense subset of the extreme points.8 An example at the end of Section 4.1 motivates focusing on exposed points.
A monitoring structure consists of a set of possible signal realizations Y and a family of conditional probability distributions . We assume that either Y is finite and y is drawn according to a probability mass function , or Y is measurable and y is drawn according to a density : we use the same notation for both cases. We assume for all , . This full support assumption is crucial and, in particular, excludes perfect monitoring.
We also require that the monitoring structure satisfies the following key assumption.
Assumption
There exists a number such that, for any , , and , we have
Assumption 1 says that the likelihood ratio difference between and has a sub-Gaussian distribution, where the number K is called a variance proxy.9 For example, Assumption 1 holds if Y is finite, or if and , where is a deterministic function and ε has a multivariate normal distribution with covariance matrix independent of a.10 As we will see, Assumption 1 lets us apply results from large deviations theory to bound the power of tail tests that aggregate signals over many periods.
In a repeated game with public monitoring, in each period , each player i takes an action , and then a signal y is drawn according to and is publicly observed. A history for player i at the beginning of period t takes the form . A strategy for player i maps histories to distributions over actions . A strategy for player i is public if it depends on only through its public component . Players choose strategies to maximize discounted expected payoffs, with common discount factor . A perfect public equilibrium (PPE) is a profile of public strategies that, beginning at any period t and any public history , forms a Nash equilibrium from that period on. We denote the repeated game with public monitoring with stage game G, monitoring structure , and discount factor δ by , and we denote the corresponding set of PPE payoff vectors by . Thus, is the set of attainable payoffs in a PPE where signals are publicly observed.
In a blind repeated game, the players do not observe signals directly but are assisted by a mediator with commitment power.11 In each period , (i) the mediator privately recommends an action to each player i, (ii) each player i takes an action , and (iii) a signal y is drawn according to and is observed only by the mediator. A history for the mediator at the beginning of period t takes the form , while a history for player i just before she takes an action in period t takes the form . A strategy for the mediator maps histories to distributions over recommendation profiles , while a strategy for player i maps histories to distributions over actions . We denote the blind repeated game with stage game G, monitoring structure , and discount factor δ by , and we denote the corresponding set of Nash equilibrium payoff vectors (taking the union over all possible mediator strategies) by . Thus, is the set of attainable payoffs in a Nash equilibrium where signals are privately observed by a mediator.12
By standard arguments (similar to Forges (1986)), any Nash equilibrium outcome in (i.e., any equilibrium distribution over infinite paths of action profiles and signals) can also be implemented by a Nash equilibrium in where the players follow the mediator's recommendations on path. Since PPE is a refinement of Nash equilibrium, it follows that . Our goal is to evaluate the advantage of arbitrary equilibria based on private signals over recursive equilibria based on public signals: that is, to assess the size of the set .
Remark
The model is easily adapted to allow a player with commitment power (such as a principal with full commitment power in a principal-agent model) or one or more players with perfectly observed actions (such as a principal who offers contracts each period in a relational contracting model). A player with commitment power is treated like any other player, except that no incentive constraints are imposed on her strategy. For example, in a principal-agent model, is the set of mixed action profiles where the agent does not have a profitable deviation. Moreover, it suffices to impose full support (and sub-Gaussianity) only for the agent, so that for all a, that agree on the principal's action. Similarly, to extend our results to the case where some players' actions are perfectly observed, let be the set of players with observable actions, and assume that deviations by players do not affect the support of p, so that for all , , and . Then Theorem 1 below applies for any that cannot be attained by an action profile distribution α such that for each player and each manipulation (where these objects are defined in Section 4.2), while Theorem 2 applies verbatim.
Overview of results
We first provide an informal overview of our results. We focus on two leading cases: finite stage games and games where the boundary of the feasible payoff set has positive curvature.
Finite games
With a finite stage game, Hörner and Takahashi (2016) showed that the minimum distance between any PPE payoff vector and any extreme, nonstatic Nash payoff vector is of order . This result relies on Fudenberg, Levine, and Maskin's (1994) recursive characterization of PPE and is generalized by our Theorem 2.
In contrast, our main result concerns arbitrary Nash equilibrium payoffs in the blind game. There is a wide range of nonrecursive equilibria in the blind game. A leading example of these equilibria is given by review strategies (Radner (1985), Abreu, Milgrom, and Pearce (1991), Matsushima (2004)), which aggregate signals over T periods—during which the players take constant actions—before adjusting the players' actions. Heuristically, an optimal review strategy pools information for periods and then applies a penalty if the number of “good signals” over these periods falls short of a cutoff. Call the number of standard deviations by which the number of good signals falls short of its mean the score. Since the number of good signals, normalized by , is approximately normally distributed, for any cutoff score z the probability that a single signal is pivotal is .13 As stage game payoffs are , incentive compatibility requires that the pivot probability is at least , so that z is at most , which in turn implies that the review strategy's “false positive rate” (and hence its minimum inefficiency) is .14 Thus, review strategies improve on PPE by at most a factor of . Our Theorem 1 implies that this factor is unimprovable. Thus, combining Theorems 1 and 2 shows that withholding feedback accelerates convergence to efficiency by at most a factor of . Moreover, our Proposition 1 constructs an equilibrium that attains this factor in a one-sided prisoner's dilemma, which shows that this result is tight.
Positive curvature
Now consider an infinite stage game where the boundary of the feasible payoff set has positive curvature. In this case, Theorem 2 shows that PPE in the public game can attain inefficiency of order . As we explain following the statement of Theorem 2, this improved efficiency relative to finite games is obtained because a smooth set of equilibrium payoffs can approximate a smooth set of feasible payoffs more closely than a kinked set of feasible payoffs. Conversely, Theorem 1 shows that arbitrary Nash equilibria in the blind game cannot attain inefficiency of order less than . Thus, in the positive curvature case, withholding feedback does not accelerate convergence to efficiency. Moreover, Theorem 3 shows that, in principal-agent problems, withholding feedback does not reduce first-order inefficiency.
Maximum efficiency with arbitrary strategies
Main result
Our main result gives an upper bound for the rate of convergence of toward an exposed point that is not attainable as a static correlated equilibrium. As discussed above, the bound depends on the order of curvature of the boundary of F at v.
Definition
Fix an exposed point . For any , the boundary of F has max-curvature of order at least β at v if, for all , there exists such that
This definition says that moving away from an exposed point v in the convex set F entails an efficiency loss of order at least β, relative to Pareto weights λ: heuristically, is approximated by a power function of degree β at v. Note that because F is convex, and by the definition of an exposed point. To understand the definition, the key cases to consider are , , and the limit case .15
- The case arises when the stage game G is finite. This implies a first-order efficiency loss from moving away from any extreme point. This case is studied by Abreu, Pearce, and Stacchetti (1990), Fudenberg, Levine, and Maskin (1994), Hörner and Takahashi (2016), and Sugaya and Wolitzky (2023).
- The case arises when the boundary of F has positive curvature. This case is studied by Green and Porter (1984), Spear and Srivastava (1987), Sannikov (2007, 2008), and Sadzik and Stacchetti (2015). More generally, if then the boundary of F has nonzero curvature: its curvature is positive but finite if and is infinite if .
- The case arises in the limit where the boundary of F is linear at v. This limit case occurs in repeated games with transferable utility, as in Athey and Bagwell (2001), Levin (2003), and Goldlücke and Kranz (2012).
The following is our main result.
Theorem
Fix an exposed point where has max-curvature of order β, and fix a direction . Then there exists such that
The key implications of Theorem 1 are as follows:
- For Pareto weights where welfare is maximized at a kink in bnd(F), equilibrium inefficiency in the blind game is at least .
- For Pareto weights where welfare is maximized at a point where bnd(F) has positive curvature, equilibrium inefficiency in the blind game is at least .
We will see that both of these bounds—as well as the bound for —are tight in the blind game. Moreover, the bound in the kinked case remains tight up to log-factor slack in the public game, while the bound in the case remains tight up to constant-factor slack in the public game. These results imply that the gains from non-recursive equilibria are small at any point of nonzero curvature when players are patient.16
We outline the proof of Theorem 1 in the next subsection. The basic logic is that if a repeated game Nash equilibrium gives payoffs close to , then the stage game payoff must be close to v almost all the time along the equilibrium path of play. Since signals have full support, this implies that continuation payoffs usually remain close to v even after low-probability signal realizations, and hence that equilibrium continuation play does not vary much with the signal realizations. But then, if , we can conclude that δ must be so high that even small variations in continuation play can provide strong incentives.17
We mention a couple technical aspects of the statement of Theorem 1. First, generically, the condition is equivalent to : since v is extremal, the distinction only matters in the nongeneric case where v is attained at two different pure action profiles. Second, we focus on exposed points because the condition (i.e., ) cannot be weakened to . To see this, consider the stage game
Proof sketch for Theorem 1
We sketch the proof of Theorem 1, deferring the details to the Appendix. Fix any and . We wish to derive a lower bound for —the inefficiency of w in direction λ—which holds for any .
We introduce some notation. Note that any outcome defines a marginal distribution over period-t action profiles, , as well as an occupation measure over action profiles, , defined as
Now, for each player i, let denote the set of functions , which we call manipulations. For any , , and , define the deviation gain
A simple necessary condition for an outcome μ to be consistent with equilibrium play (Lemma 6 in the Appendix) is that, for each player i, manipulation , and period t, we have
To prove the theorem, it remains to bound (3) as a function of δ and β. To do so, consider the inner problem where μ is fixed and . Let denote the Lagrange multiplier on the period t incentive constraint, and form the Lagrangian
Tightness of the efficiency bound in the kinked case
We will see in the next section that inefficiency of order is attainable when under public monitoring. This implies that the lower bound on inefficiency in Theorem 1 cannot be improved when (the smooth, nonzero curvature case). Here, we show that in the kinked case () inefficiency of order is sometimes attainable in the blind game. This shows that the lower bound on inefficiency in Theorem 1 also cannot be improved when . Consequently, withholding feedback can accelerate the rate of convergence by at most a factor of in the kinked case.
We consider a one-sided prisoner's dilemma, where the payoff matrix is
Proposition
In the one-sided prisoner's dilemma, there exists such that, for any sufficiently large , there exists satisfying
The proof constructs a review strategy with inefficiency of order , as sketched in Section 3.1.
Attainable efficiency with public strategies
We now show that the maximum efficiency levels identified in Theorem 1 are attainable under public monitoring in the smooth, nonzero curvature case, and are attainable up to a log factor in the kinked case. To this end, denote the set of feasible and strictly individually rational payoffs by . For , define .
The following definition is a counterpart of Definition 1, adjusted to apply to all boundary points rather than only exposed points. It says that moving away from v along the boundary of entails an efficiency loss of order at most β, relative to Pareto weights λ.18
Definition
Fix a boundary point . For any , the boundary of has min-curvature of order at most β at v if, for all , there exists such that
Note that, at any exposed point , the min-curvature of is at least 1 and at most the max-curvature.
The following assumption generalizes standard identification conditions for the public-monitoring folk theorem to the case where action sets can be infinite.
Assumption
There exists such that the following conditions hold:
- i.For each i, there exists a minmax profile against i, , and for each , such that
- ii.For each , , and with , there exists such that
Intuitively, Assumption 2 says that, when payoff transfers of magnitude at most are available, players −i can be incentivized to minmax player i, and player i can be incentivized to take a given action via transfers from player j without affecting player j's incentive to take a given action .19
We consider the rate of convergence of toward a strictly individually rational payoff vector . For finite stage games, Hörner and Takahashi (2016) show that this rate equals . Thus, withholding feedback can accelerate the rate of convergence by at most a factor of in finite-action games. We now show that whenever the boundary of has nonzero curvature (), the rate equals . (We discuss the zero curvature case below.) Thus, withholding feedback cannot accelerate the rate of convergence in smooth games with nonzero curvature.
We require the standard assumption that and further exclude payoff vectors where some player obtains her maximum feasible payoff.20
Theorem
Assume that Assumption 2 holds and , and fix any , satisfying for all i, where has min-curvature of order . Then there exists such that for any sufficiently large .
Theorem 2 builds on Fudenberg, Levine, and Maskin (1994), Hörner and Takahashi (2016), and Sugaya and Wolitzky (2023). As these authors showed, a given level of inefficiency relative to an exposed point v and a direction is attainable under public monitoring if it equals the distance in direction λ between v and a self-generating ball . We thus seek a self-generating ball at distance to v in direction . To this end, let be the desired distance, and (without loss) let be the closest point to v in B (see Figure 1). Consider decomposing u into an instantaneous payoff v and continuation payoffs that lie on the translated tangent hyperplane H with normal vector λ passing through the point . Under Assumption 2, the continuation payoffs can be chosen to enforce v on if the diameter of , which we denote by x, is of order . At the same time, denoting the radius of B by r, the Pythagorean theorem gives , and hence . It follows that the product rd is of order , and hence . Finally, for a point v where the (max-)curvature of equals β, a ball B with radius and center , where , lies entirely within F. For example, if then r and d are both , and thus shrink at the same rate as ; while if then and , so B simply shifts toward v as .21
[IMAGE OMITTED. SEE PDF]
In light of Theorem 1, when one might hope to find conditions under which . While this may be possible, we do not pursue such a result here. The difficulty is that the corresponding ball B would have radius r of at least (as rd must be at least ). While such a ball can satisfy the self-generation condition in a neighborhood of v, its radius explodes as (when ), so it must violate self-generation at some point far from v. Therefore, any conditions that ensure that is less than must involve the global geometry of the feasible payoff set. Investigating such conditions is left for future work.
We finally mention a class of infinite games where Assumption 2(ii) holds.22 Say that the game is linear-concave if (i) for each i, is a compact interval , and is differentiable and concave in for every with a bounded derivative: there exists such that for all i, a; and (ii) the public signal is a D-dimensional real variable, , and is a linear function of a for each dimension d. In a linear-concave game, let be a D-dimensional vector representing the sensitivity of the mean public signal to player i's action. Say that a linear-concave game satisfies pairwise identifiability if for any a and , , and the spans of and intersect only at the origin.23
Proposition
In any linear-concave game satisfying pairwise identifiability, Assumption 2(ii) holds.
Proof of Theorem 2
We recall a key definition and lemma from Abreu, Pearce, and Stacchetti (1990).
Definition
A bounded set is self-generating if for all , there exist and satisfying
- Promise keeping (PK).
- Incentive compatibility (IC) for all i.
- Self-generation (SG) for all y.
When (PK), (IC), and (SG) hold, we say that the pair decomposes on W.
Lemma
Any bounded, self-generating set W is contained in .
It thus suffices to find a bounded, self-generating set W such that , where . To do so, we first establish a sufficient condition for a ball B to be self-generating. This condition builds on Fudenberg and Levine (1994) and Sugaya and Wolitzky (2023).24
Definition
The maximum score in direction with reward bound is
- 1.(IC): for all i.
- 2.Half-space decomposability with reward bound (HS): and for all y.
Lemma
For any and , if a ball B of radius r satisfies
We then show that there exists B with that satisfies the sufficient condition for self-generation just given.
Lemma
There exist , and such that, for any , there exist and a ball B of radius r satisfying (8), (9), and .
The proof of Lemma 3 uses Assumption 2 and the assumptions that , for all i, and has min-curvature of order at v. The logic is similar to that accompanying Figure 1.
The proofs of Lemmas 2 and 3 are deferred to the Appendix. Given these lemmas, taking , c, and as in Lemma 3 establishes Theorem 2.
A stronger result for the principal-agent problem
In this section, we establish that withholding feedback in a standard repeated principal-agent problem leaves unchanged not only the rate of convergence to efficiency (the order of inefficiency in ), but also the exact level of first-order inefficiency (the constant multiplying ). This stronger result also has the virtue of identifying the precise features of the stage game and the monitoring structure that determine the level of first-order inefficiency.
Consider a standard repeated principal-agent problem in discrete time. In each period t, an agent chooses an effort level a from a compact interval , and a signal y is then drawn according to a pmf or pdf . Assume that is twice continuously differentiable in a, with first and second derivatives and . A contract specifies, for each period t, a recommended effort level as a function of the history of past recommendations and signals , as well as the agent's current consumption as a function of . In the public game, the agent chooses her period t action as a function of ; in the blind game, she chooses as a function of only. The agent's payoff in period t is , where the consumption utility u is twice continuously differentiable on with , , , , and
We remark that our public game is nearly identical to the model of Spear and Srivastava (1987) (although unlike them we do not require a monotone likelihood ratio) or a discrete-time version of the baseline model of Sannikov (2008) (although we allow much more general monitoring structures, as discussed below).26 However, those papers do not consider the possibility of withholding feedback from the agent.
For any effort level , the score of the signal y is
Assumption
The following hold:
- i.For all , there exists such that
- ii. is strictly positive and Lipschitz continuous on A.
- iii.The score is sub-Gaussian with variance proxy :
- iv.There exists such that, for all , we have
For example, Assumption 3 is satisfied if Y is finite, or if and for a deterministic function with a bounded gradient and multivariate normal noise ε with covariance independent of a. Note that Assumption 3(iii) strengthens Assumption 1.
For any , where , let be the first-best payoff for the principal when the agent's payoff equals w, which is given by
Finally, let (resp., ) denote the maximum payoff for the principal over all (resp., ) where the agent's payoff is w. That is, is the principal's second-best payoff in the blind game, while is her second-best payoff in the public game. Recall that , so . Nonetheless, we show that and agree up to a first-order approximation as .
Theorem
For any and , we have
Theorem 3 shows that, whether or not the agent receives feedback, the first-order inefficiency of an optimal contract is precisely
A rough intuition for Theorem 3 is that, with high probability, the agent's continuation payoff is approximately constant for a long time under an optimal contract, so there is little information about the continuation payoff to conceal, and thus little to gain from concealing it.
The proof of Theorem 3 is facilitated by the principal's ability to commit to delivering any feasible promised continuation value for the agent. It may be possible to generalize Theorem 3 to smooth games with 1-dimensional actions and product structure monitoring (as considered by Sannikov (2007)), but this would require constructing equilibria that attain specific continuation payoff vectors far from the initial target vector. This possibility is left for future research.
We finally comment on the role of condition (10). This condition implies that the second-order efficiency loss from varying the agent's utility is uniformly bounded away from zero. With CRRA utility , it holds if and only if . Without this condition, review strategies with infrequent, large rewards may yield a first-order improvement over (14) if converges to 0 sufficiently slowly as .
Proof sketch for Theorem 3
We first bound from above. Fix a period t and a small constant , and consider the manipulation where, whenever the agent is recommended effort a in period t, she instead takes effort . (The agent thus shades her effort more after recommendations where effort is more costly or less detectable.) For this manipulation to be unprofitable for all t and , we must have
We next bound from below. Given a continuation payoff for the agent, suppose the principal implements first-best effort by offering the corresponding first-best consumption and providing incentives entirely by varying the continuation payoff while making it a martingale: . The Taylor approximation of inefficiency is then equal to
Discussion
The low-discounting/low-monitoring double limit
This paper focuses on the rate at which inefficiency vanishes as for a fixed monitoring structure. In contrast, in Sugaya and Wolitzky (2023) we showed that in the double limit where simultaneously and monitoring precision degrades, efficiency depends on a ratio of discounting and monitoring precision. This double limit arises, for example, in the “frequent action limit” considered by Abreu, Milgrom, and Pearce (1991), Fudenberg and Levine (2007), Sannikov and Skrzypacz (2010), and Sadzik and Stacchetti (2015), where signals are parameterized by an underlying continuous-time process, actions and signal observations occur simultaneously every Δ units of time, and the analysis concerns the limit.
The results of the current paper extend to the low-discounting/low-monitoring double limit. To see this, maintain the assumption that the monitoring structure is sub-Gaussian with variance proxy K, but now view K as a variable that varies simultaneously with the discount factor. Since K proxies the variance of the likelihood ratio difference, a lower value for K corresponds to less precise monitoring, so the low-discounting/low-monitoring double limit arises when and simultaneously. For example, in the standard frequent action limit, discounting and monitoring vanish at the same rate, so remains constant as and .
From this more general perspective, it can be shown (by nearly the same proof) that Theorem 1 holds verbatim with in place of . Conversely, Theorem 2 also holds with in place of , under the condition that in Assumption 2 can be taken to be of order . For example, this condition holds with finite signals with bounded away from zero, or with Gaussian signals.29
Summary and directions for future research
This paper has used a rate-of-convergence approach to analyze the gains from nonrecursive equilibria in repeated agency problems and games with patient players. The main result is that these gains are “small”: (i) in finite-action games, nonrecursive equilibria reduce inefficiency by at most a log factor; (ii) in smooth games, nonrecursive equilibria reduce inefficiency by at most a constant factor; and (iii) in smooth principal-agent problems, non-recursive equilibria do not reduce first-order inefficiency at all. The key force underlying these results is that, while pooling information across periods improves monitoring precision, it also entails larger rewards and punishments, which reduce the scope for providing incentives by transferring future surplus between the players rather than destroying it.
A basic lesson of our analysis is that the value of withholding feedback in dynamic agency is very different in a one-off production process that unfolds gradually over time (as in Holmström and Milgrom (1987)) as compared to a repeated interaction. Since continuation payoff transfers are impossible in one-shot interactions, the monitoring benefit of withholding feedback dominates, so withholding feedback can be very valuable. But in repeated interactions, this benefit is offset by the cost of using larger rewards and punishments, which limit continuation payoff transfers.
We mention some possible extensions of our results. First, as discussed in Section 5, characterizing rates of convergence toward extreme points with curvature of order is a challenging open question involving nonlocal properties of the feasible payoff set. Second, as discussed in Section 6, it may be possible to generalize Theorem 3 from smooth agency problems to smooth games. Third, it would be interesting to relax the assumption that the likelihood ratio difference is sub-Gaussian. This could result in a faster rate of convergence, because rare but highly informative signals would become more common, and such signals become more useful as δ increases. Fourth, it would be interesting to extend our results to irreducible stochastic games with observable states, where one can investigate the rate of convergence to extreme points of the limit feasible payoff set. Fifth, one could consider stochastic games where the state is only observed by the principal. In this setting, withholding information has costs as well as benefits, which are interesting to compare. Finally, the rate of convergence to efficiency as discounting vanishes may be a useful lens for analyzing a range of other questions about long-run economic relationships, beyond the value of withholding performance feedback.
Appendix
Proof of Theorem 1
We first bound a player's deviation gain at any that attains payoffs close to v.
Lemma
There exist and such that, for all satisfying , there exist a player i and a manipulation such that .
Proof
Since , for all such that , there exist i and such that . Let
Now suppose that for all there exists satisfying and for all i, . Taking a subsequence if necessary, . Moreover, we have (since and ), and for all i (by the maximum theorem), so , contradicting the definition of γ. □
Fix such ε and γ. Next, for any outcome μ and period T, define the occupation measure over the first T periods by , and define . We first bound for any μ where all players' deviation gains over the first periods are small.
Lemma
For any outcome μ where for all players i and manipulations , we have .
Proof
Since by construction, we have
We next establish the incentive constraint, (2).
Lemma
For any equilibrium outcome , player i, manipulation , and period t, we have .
Proof
For any sequence of action profiles and any period t, let . Since μ is an equilibrium outcome, for every we have
We now come to our key lemma, which bounds (4)—and hence —for any μ where some player's deviation gain over the first periods is large.
Lemma
There exists such that, for any outcome μ, player i, and manipulation , and discount factor satisfying , we have
Together, Lemmas 5, 6, and 7 imply that for all and . Theorem 1 therefore holds with .
It thus remains to prove Lemma 7. To this end, let if , and otherwise. Letting to ease notation, we then have
Lemma
For each , there exists such that, for any μ and δ, the value of (16) is no less than .
In turn, Lemma 8 relies on the following large deviations bound for martingales.
Lemma
Let be a sequence of martingale increments adapted to a filtration , so that , and let be a stochastic process adapted to the same filtration satisfying for all and . Let and . For all , we have , and hence (i) for all , and (ii) for all .
Proof
By iterated expectation,
Proof of Lemma
We consider separately the cases where and .
Case 1: When , the minimand in (16) is linear in . Minimizing over , we see that (16) equals
Case 2: When , the minimand in (16) is convex in . Relaxing the constraint and minimizing over gives
Appendix
Proof of Proposition 1
Consider a review strategy where the game is divided into blocks of T consecutive periods. Let , where is a small number to be determined: note that when . In the first block, the players are prescribed in every period. At the end of the first block—as well any subsequent block where is prescribed—the mediator records the summary statistic
We show that there exists and such that, for any , the parameters ρ and q can be chosen so that this strategy profile is an equilibrium that yields payoff for each player.31
Let p be the probability that when player 1 takes C throughout a block; let be the probability that when player 1 takes D once and takes C times; and let be the probability that when player 1 takes D throughout. Observe that v is given by
Define
We now establish (22)–(24). Let . Note that
We first establish (22). Recall that the are independent Bernoulli random variables. As shown by Zhu, Li, and Hayashi (2022, Theorem 2.1),
We next establish (23). Applying Stirling's formula to (25), we have
Finally, we establish (24). We will show that and . Hence, for sufficiently large δ, . This implies (24), as we have
It remains to show that and . Note that the random variable has zero mean and unit variance when player 1 takes C. Thus, by the Berry–Esseen theorem, there exists an absolute constant such that
Appendix
Proof of Proposition 2
To define , we first observe that for each pair of players and each action profile a, we can take such that (i) has mean 0 and bounded Euclidean norm; (ii) rewards induce player i to take when her opponents take ; and (iii) is independent of player j's action.
Lemma
There exists such that, for each pair of players and action profile , there exist such that , , , and for all y.
Proof
For each a and , let be the value of the program
Since is compact and N is finite, it suffices to prove that, for each , (i) for all a, and (ii) is upper semicontinuous.
We first prove (i). As in Lemma 1 of Sannikov (2007), pairwise identifiability implies that the columns of are linearly independent, so there exists such that is a D-dimensional invertible matrix. For
We next prove (ii). Fix any a and . There exists b such that and b satisfies and . Take as in the proof of (i). Taking sufficiently small, we can guarantee that is a D-dimensional invertible matrix for each with . Define a D-dimensional vector by
Given Lemma 10, Assumption 2(ii) holds with . To see why, for any i and a, let and . Then
Appendix
Proof of Lemma 2
The proof is similar to (but simpler than) the proof of Lemma 6 of Sugaya and Wolitzky (2023). To show that B is self-generating, it suffices to show that the extreme points of any ball of radius are decomposable on .
Lemma
Appendix
Proof of Lemma 3
Since
The following lemma is similar to Lemma 5 of Hörner and Takahashi (2016) or Lemma 7 and pp. 1750–1751 of Sugaya and Wolitzky (2023).
Lemma
There exists
Proof
Let
(i)
(ii)
(iii) Otherwise, there exists
By Lemma 13, it suffices to find
If
For the rest of the proof, we assume that
Lemma
There exist
Proof
Since
Now fix any
To see this, let
Lemma
There exist
Proof
Fix
Recall that, by construction,
We consider separately the cases where
Next, consider
We now complete the proof of Lemma 3. Take
Appendix
Proof of Theorem 3
We first show that first-order inefficiency in the blind game is no less than (14). Fix
By feasibility, the principal's payoff is at most
The following is the key lemma.
Lemma
There exist
We sketch the proof of Lemma 16, providing the details in the next section. Subtracting
At the same time, since
Lemma
There exist
We now show that first-order inefficiency in the public game is no more than (14). The proof is constructive. As a first step, it is helpful to first construct a static contract that induces a target effort level
Heuristically, the repeated game equilibrium is constructed by using the above reward to adjust the agent's continuation payoff
Formally, fix any
The proof is completed by the following lemma, which shows that the first-order inefficiency of this equilibrium is no more than (14).
Lemma
There exist
Intuitively, since
Appendix
Omitted details for the proof of Theorem 3
We require some preliminary lemmas. The first two derive properties of the feasible payoff set.
Lemma
There exists
Proof
Since
Lemma
There exists
Proof
Differentiating the equality
The next lemma gives a key probability bound.
Lemma
For any
Proof
Let
We first establish (44). Since
We next establish (45). For any t, we have
We now establish inequality (15).
Lemma
We have
Proof
Fix any t and
Now we prove our key lemmas, Lemmas 16 and 17. These complete the proof that inefficiency is at least (14) in the blind game.
Proof of Lemma 16
Multiplying both sides of (38) by
It remains to bound (46). Since
Lemma
There exists
Proof
For sufficiently large δ, we have
Lemma
For any sufficiently large
Proof
We first show that, for sufficiently large δ,
Thus, for sufficiently large δ, we have
By (48), (49), and
Proof of Lemma 17
If α assigns probability 1 to
Since
Construction of
Lemma
There exists
Proof
Define, in turn,
For (58), note that
For (59), note that
We now establish (52)–(55). Note that (53) follows directly from (57). For (52), for any
By Cauchy–Schwarz, the first line is bounded by
Similarly, again by Cauchy–Schwarz, the second line is bounded by
We next establish (54). By construction, we have
We finally establish (55). It suffices to show that, for any
Equilibrium verification
We verify that the contract defined in the main Appendix, with
Lemma
For each
Proof
The conclusion is immediate if
Proof of Lemma 18
Let
We first bound the weight on irregular histories under the equilibrium outcome μ.
Lemma
For any sufficiently large
Proof
Note that
Recall that
At the same time, since
Combining these bounds, we have
It remains to bound
Abreu, Dilip, Paul Milgrom, and David Pearce (1991), “Information and timing in repeated partnerships.” Econometrica, 59 (6), 1713–1733.
Abreu, Dilip, David Pearce, and Ennio Stacchetti (1990), “Toward a theory of discounted repeated games with imperfect monitoring.” Econometrica, 58 (5), 1041–1063.
Aliprantis, Charalambos and Kim Border (2006), Infinite Dimensional Analysis: A Hitchhiker's Guide. Springer Science & Business Media.
Aoyagi, Masaki (2010), “Information feedback in a dynamic tournament.” Games and Economic Behavior, 70 (2), 242–260.
Athey, Susan and Kyle Bagwell (2001), “Optimal collusion with private information.” RAND Journal of Economics, 32 (3), 428–465.
Ball, Ian (2023), “Dynamic information provision: Rewarding the past and guiding the future.” Econometrica, 91 (4), 1363–1391.
Buldygin, Valerii and Yu Kozachenko (2000), Metric Characterization of Random Variables and Random Processes, Vol. 188. American Mathematical Society.
Ederer, Florian (2010), “Feedback and motivation in dynamic tournaments.” Journal of Economics & Management Strategy, 19 (3), 733–769.
Ely, Jeffrey, George Georgiadis, Sina Khorasani, and Luis Rayo (2023), “Optimal feedback in contests.” Review of Economic Studies, 90 (5), 2370–2394.
Ely, Jeffrey, George Georgiadis, and Luis Rayo (2025), “Feedback design in dynamic moral hazard.” Econometrica, 93 (2), 597–621.
Ely, Jeffrey C. and Martin Szydlowski (2020), “Moving the goalposts.” Journal of Political Economy, 128 (2), 468–506.
Forges, Francoise (1986), “An approach to communication equilibria.” Econometrica, 54 (6), 1375–1385.
Frick, Mira, Ryota Iijima, and Yuhta Ishii (2024), “Monitoring with rich data.” Working Paper.
Fuchs, William (2007), “Contracting with repeated moral hazard and private evaluations.” American Economic Review, 97 (4), 1432–1448.
Fudenberg, Drew, David Levine, and Eric Maskin (1994), “The folk theorem with imperfect public information.” Econometrica, 62 (5), 997–1039.
Fudenberg, Drew and David Levine (1994), “Efficiency and observability with long‐run and short‐run players.” Journal of Economic Theory, 62 (1), 103–135.
Fudenberg, Drew and David Levine (2007), “Continuous time limits of repeated games with imperfect public monitoring.” Review of Economic Dynamics, 10 (2), 173–192.
Gershkov, Alex and Motty Perry (2009), “Tournaments with midterm reviews.” Games and Economic Behavior, 66 (1), 162–190.
Goldlücke, Susanne and Sebastian Kranz (2012), “Infinitely repeated games with public monitoring and monetary transfers.” Journal of Economic Theory, 147 (3), 1191–1221.
Green, Edward J. and Robert H. Porter (1984), “Noncooperative collusion under imperfect price information.” Econometrica, 52 (1), 87–100.
Halac, Marina, Navin Kartik, and Qingmin Liu (2017), “Contests for experimentation.” Journal of Political Economy, 125 (5), 1523–1569.
Holmström, Bengt and Paul Milgrom (1987), “Aggregation and linearity in the provision of intertemporal incentives.” Econometrica, 55 (2), 303–328.
Hörner, Johannes and Satoru Takahashi (2016), “How fast do equilibrium payoff sets converge in repeated games?” Journal of Economic Theory, 165, 332–359.
Kandori, Michihiro (2002), “Introduction to repeated games with private monitoring.” Journal of Economic Theory, 102 (1), 1–15.
Kandori, Michihiro and Hitoshi Matsushima (1998), “Private observation, communication and collusion.” Econometrica, 66 (3), 627–652.
Kandori, Michihiro and Ichiro Obara (2006), “Efficiency in repeated games revisited: The role of private strategies.” Econometrica, 74 (2), 499–519.
Levin, Jonathan (2003), “Relational incentive contracts.” American Economic Review, 93 (3), 835–857.
Lizzeri, Alessandro, Margaret Meyer, and Nicola Persico (2002), “The incentive effects of interim performance evaluations.” Working Paper.
Madrigale, Vicente (1986), “On the non‐existence of efficient equilibria of repeated principal agent games with discounting.” Working Paper.
Matsushima, Hitoshi (2001), “Multimarket contact, imperfect monitoring, and implicit collusion.” Journal of Economic Theory, 98 (1), 158–178.
Matsushima, Hitoshi (2004), “Repeated games with private monitoring: Two players.” Econometrica, 72 (3), 823–852.
Meng, Delong (2021), “On the value of repetition for communication games.” Games and Economic Behavior, 127, 227–246.
Mirrlees, James A. (1975), “The theory of moral hazard and unobservable behaviour: Part I.” Working Paper. Published in Review of Economic Studies, 66 (1) (1999), 3–21.
Orlov, Dmitry, Andrzej Skrzypacz, and Pavel Zryumov (2020), “Persuading the principal to wait.” Journal of Political Economy, 128 (7), 2542–2578.
Radner, Roy (1985), “Repeated principal‐agent games with discounting.” Econometrica, 53 (5), 1173–1198.
Rahman, David (2014), “The power of communication.” American Economic Review, 104 (11), 3737–3751.
Rubinstein, Ariel (1979), “An optimal conviction policy for offenses that may have been committed by accident.” In Applied Game Theory (Schotter Brams and Schwodiauer, eds.), 406–413, Physical‐Verlag, Heidleberg.
Rubinstein, Ariel and Menahem E. Yaari (1983), “Repeated insurance contracts and moral jazard.” Journal of Economic Theory, 30 (1), 74–97.
Sadzik, Tomasz and Ennio Stacchetti (2015), “Agency models with frequent actions.” Econometrica, 83 (1), 193–237.
Sannikov, Yuliy (2007), “Games with imperfectly observable actions in continuous time.” Econometrica, 75 (5), 1285–1329.
Sannikov, Yuliy (2008), “A continuous‐time version of the principal‐agent problem.” Review of Economic Studies, 75 (3), 957–984.
Sannikov, Yuliy and Andrzej Skrzypacz (2007), “Impossibility of collusion under imperfect monitoring with flexible production.” American Economic Review, 97 (5), 1794–1823.
Sannikov, Yuliy and Andrzej Skrzypacz (2010), “The role of information in repeated games with frequent actions.” Econometrica, 78 (3), 847–882.
Smolin, Alex (2021), “Dynamic evaluation design.” American Economic Journal: Microeconomics, 13 (4), 300–331.
Spear, Stephen E. and Sanjay Srivastava (1987), “On repeated moral hazard with discounting.” Review of Economic Studies, 54 (4), 599–617.
Sugaya, Takuo (2022), “Folk theorem in repeated games with private monitoring.” Review of Economic Studies, 89 (4), 2201–2256.
Sugaya, Takuo and Alexander Wolitzky (2017), “Bounding equilibrium payoffs in repeated games with private monitoring.” Theoretical Economics, 12, 691–729.
Sugaya, Takuo and Alexander Wolitzky (2018), “Maintaining privacy in cartels.” Journal of Political Economy, 126 (6), 2569–2607.
Sugaya, Takuo and Alexander Wolitzky (2023), “Monitoring vs. Discounting in repeated games.” Econometrica, 91 (5), 1727–1761.
Wood, David C. (1992), “The computation of polylogarithms.” Working Paper.
Zhu, Huangjun, Zihao Li, and Masahito Hayashi (2022), “Nearly tight universal bounds for the binomial tail probabilities.” Working Paper.
Copyright John Wiley & Sons, Inc. 2025