Transferable Utility Cooperative Differential

Full text

Turn on search term navigation

1. Introduction

Dynamic or differential games are an important subsection of game theory that investigates interactive decision-making over time. A differential game is when the evolution of the decision process takes place over a continuous time frame, and it generally involves a differential equation. Differential games provide an effective tool for studying a wide range of dynamic processes such as, for example, problems associated with controlling pollution where it is important to analyze the interactions between participants’ strategic behaviors and the dynamic evolution in the levels of pollutants that polluters release. In Carlson and Leitmann [1], a direct methodfor finding open-loop Nash equilibria for a class of differential n-player games is presented.

Cooperative optimization points to the possibility of socially optimal and individually rational solutions to decision-making problems involving strategic action over time. The approach to solving a static cooperative differential game is typically done using a two-step procedure. First, one determines the collective optimal solution and then payoffs are transferred and distributed by using one of the many accessible cooperative game solutions, such as core, Shapley value, nucleolus. In the dynamic cooperative game, it must be assured that, over time, all players will comply with the agreement. This will occur if each player’s profits in the cooperative situation at any intermediate moment dominate their non-cooperative profits. This property is known as time consistency and was introduced by Petrosjan (originally 1977) [2].

In order to derive equilibrium solutions, existing differential games often depend on the assumption of time-invariant game structures. However, the future is full of essentially unknown events. Therefore, it is necessary to consider that the information available to players about the future be limited. In the realm of dynamic updating, the Looking Forward Approach is used in game theory, and in differential games especially. The Looking Forward Approach solves the problem of modeling players’ behavior when the process information is dynamically updating. This means that the Looking Forward Approach does not use a target trajectory, but composes how a trajectory is to be used by players and how a cooperative payoff is to be allocated along that trajectory. The Looking Forward Approach was first presented in [3]. Afterward, the works in [4,5,6,7,8,9] were published.

In [10,11,12,13,14,15], a class of non-cooperative differential games with continuous updating is considered, and it is assumed that the updating process continues to develop over time. In the paper [10], the Hamilton–Jacobi–Bellman equations of the Nash equilibrium in the game with the continuous updating are derived. The work in [11] is devoted to the class of cooperative differential games with the transferable utility using Hamilton–Jacobi–Bellman equations, construction of characteristic function with continuous updating and several related theorems. Another result related to Hamilton–Jacobi–Bellman equations with continuous updating is devoted to the class of cooperative differential games with nontransferable utility [15]. The works in [13,14] are devoted to the class of linear-quadratic differential games with continuous updating. There the cooperative and non-cooperative cases are considered and corresponding solutions are obtained. In the paper [12], the explicit form of the Nash equilibrium for the differential game with continuous updating is derived by using the Pontryagin maximum principle. In this paper, the class of cooperative game models is examined and results concerning a cooperative setting such as the construction of the notion of cooperative strategies, the characteristic function, and cooperative solution for a class of games with continuous updating using Pontryagin maximum principle are presented. Theoretical results for three players are illustrated on a classic differential game model of pollution control presented in [16]. Another potentially important application of continuous updating approach is related to the class of inverse optimal control problems with continuous updating [17]. The approach can be used for human behavior modeling in engineering systems.

The class of differential games with dynamic and continuous updating has some similarities with Model Predictive Control (MPC) theory which is worked out within the framework of numerical optimal control in the books [18,19,20,21]. The current action of control is realized by solving a limited level of open-loop optimal control problem at each sampling moment in the MPC method. For linear systems, there is an explicit solution [22,23]. However, in general, the MPC approach needs to solve several optimization problems. Another series of related papers corresponding to the stable control category is [24,25,26,27], in which similar methods are considered for the linear-quadratic optimal control problem category. However, the goals of the current paper and the paper on continuous updating methods are different: when the information about the game process is continuous updated over time, the player’s behavior can be modeled. In [28,29], a similar issue is considered and the authors investigate repeated games with sliding planning horizons.

The paper is structured as follows. Section 2 starts with the initial differential game model and cooperative differential model. Section 3 demonstrates the knowledge of a differential game with continuous updating and a cooperative differential game with continuous updating by using the Pontryagin maximum principle method and obtains the definition of a characteristic function with continuous updating. It also presents the results of the theoretical portion. Section 4 gives an example of pollution control based on continuous updating. The conclusion is drawn in Section 5.

2. Initial Differential Game Model 2.1. Preliminary Knowledge

Consider the differential game starting with the initial position_x0and evolving on time interval[_t0,T]. The equations of the system’s dynamics have the form

x˙(t)=f(t,x,u),x(_t0)=_x0,

wherex∈^Rlis a set of variables that characterizes the state of the dynamical system at any instant of time during the play of the game,u=(_u1,…,_un),_ui∈_Ui⊂comp^Rk, is the control of player i. We shall use the notationU=_U1×_U2×…×_Un.

The existence, uniqueness, and continuability of solutionx(t)for any admissible measurable controls_u1(·),…,_un(·) was dealt with by Tolwinski, Haurie, and Leitmann [30]:

f(·):R×^Rn×U→^Rnis continuous.
There exists a positive constant k such that

∀t∈[_t0,T]and∀u∈U

∥f(t,x,u)∥≤k(1+∥x∥)
∀R>0,∃_KR>0such that

∀t∈[_t0,T]and∀u∈U

∥f(t,x,u)−f(t,^x′,u)∥≤_KR∥x−^x′∥

for all x and^x′such that

∥x∥≤Rand∥^x′∥≤R
for anyt∈[_t0,T]andx∈Xset

G(^xt)={f(t,x,u)|u∈U}

is a convex compact from^Rl.

The payoff of player i is then defined as

_Ki(_x0,_t0,T;u)=∫_t0T^gi[t,x(t),u]dt,i∈N,

where^gi[t,x,u],f(t,x,u)are the integrable functions,x(t) is the solution of the Cauchy problem (1) with fixed open-loop controlsu(t)=(_u1(t),…,_un(t)). The strategy profileu(t)=(_u1(t),…,_un(t)) is called admissible if the problem (1) has a unique and continuable solution.

Let us agree that in a differential game, a subgame is a “truncated” version of the whole game. A subgame is a game in its own right and a subgame starts out at time instantt∈[_t0,T], after a particular history of actionsu(t). Denote such a subgame byΓ(x,t,T). (A remark on this notation is in order. Let the state of the game be defined by the pair(x,t)and denote byΓ(x,t,T)the subgame starting at date t with the state variable x; here, the model considered is of a finite horizon and we will have the terminal time T. If we take account of the infinite horizon, of course, one expects all corresponding value functions to depend on state and not on time.)

For each(x,t,T)∈X×[_t0,T]×R, we define a subgameΓ(x,t,T)by replacing the objective functional for player i and the system dynamics by

_Ki(x,t,T;u(s))=∫tT^gi[s,x(s),u(s)]ds,i∈N,

and

x˙(s)=f(s,x(s),u(s)),x(t)=x,

respectively. Therefore,Γ(x,t,T)is a differential game defined on the time interval[t,T]with initial conditionx(t)=x.

2.2. Cooperative Differential Game Model We adopt a cooperative game methodology to solve the differential game model with transferable utility. The steps are as follows.

Define the cooperative behavior or strategies and corresponding cooperative trajectory.
Determine the computation of the characteristic function values.
Allocate among players a total cooperative payoff, such as the allocation belongs to the kernel, the bargaining set, the stable set, the core, the Shapley value and the nucleolus (see, e.g., Osborne and Rubinstein [31] for an introduction to these concepts).

First of all, we introduce the notions of cooperative strategies for players^u*=(_ui*,⋯,_un*)and the corresponding trajectory^x*(t). Strategies^u*(t)are called optimal strategies, i.e., a set of controls that maximizes the joint payoff of players:

^u*=(_ui*,⋯,_un*)=argmax_u1,⋯,_un∑i∈N_Ki(_x0,_t0,T;u).

Suppose that the maximum in (2) is achieved on the set of admissible strategies. If we substitute^u*(t) into Equation (1), we can get the cooperative trajectory^x*(t).

Consequently, to determine the way to distribute the maximum total payoff among players, it is fundamental to define the concept of the characteristic function of the coalitionS⊆N. This characteristic function shows the strength of the coalition; remarkably, it allows us to take into account the players’ contributions to each coalition.

To define a cooperative game, a characteristic function must be introduced. We callV(S;_x0,_t0,T),S⊂Nis a characteristic function for the initial differential gameΓ(_x0,_t0,T). Through a characteristic function we understand a map from the set of all possible coalitions:

V(·):^2N→R,V(∅)=0,

which assigns to each coalition S the total payoff value which the players from S can guarantee when acting independently. An important property is the superadditivity of a characteristic function:

V(_S1∪_S2;_x0,_t0,T)≥V(_S1;_x0,_t0,T)+V(_S2;_x0,_t0,T),∀_S1,_S2⊆N,_S1∩_S2=∅.

The question of constructing a characteristic function is one of the main questions in cooperative game theory. Originally, the value of the characteristic functionV(S) was interpreted by von Neumann and Morgenstern (1944) as the maximum guaranteed payoff of coalition S that it can gain acting independently of other players [32]. Presently, it is known that there are various means of constructing characteristic functions in cooperative games, such asα —c.f. [33],β —c.f. [34],ζ —c.f. [35], andγ —c.f. [36].

Similar to the above, for each(^x*(t),t,T)∈X×[_t0,T]×R, we define a cooperative subgame^Γc(^x*(t),t,T)(The superscript “c” means “cooperative”) along the cooperative trajectory^x*(t)by replacing the objective functional for player i and the system dynamics by

∑i∈N_Ki(^x*(t),t,T;u)=∑i∈N∫tT^gi[s,x(s),u(s)]ds,

and

x˙(s)=f(s,x(s),u(s)),x(t)=^x*(t),

respectively. Therefore,^Γc(^x*(t),t,T)is a cooperative differential game defined on the time interval[t,T]with initial conditionx(t)=^x*(t).

In this paper, we will adopt the constructive approach proposed by Petrosjan L. and Zaccour G. [37] with respect to putting together aδ-characteristic function.V(S;^x*(t),t,T)denotes the strength of a coalition S for the subgameΓ(^x*(t),t,T), it can be calculated in two stages: in the beginning, we are obliged to compute the Nash equilibrium strategies{_uiNE}for all playersi∈N, and second, we refrigerate the strategy for the players ofN\S, and the players of the coalition S seek to maximize their joint revenue∑i∈S_Kion_uS=_{{_ui}i∈S}. Thus, the definition of the characteristic function is given:

V(S;^x*(t),t,T)=max_u1,⋯,_un∑i∈N_Ki(^x*(t),t,T;u),S=N,max_ui,i∈S∑i∈S_Ki(^x*(t),t,T;_uS,_uN\SNE),S⊂N,0,S=∅.

Denote byL(^x*(t),t,T)the set of imputations in the gameΓ(^x*(t),t,T):

L(^x*(t),t,T)={ξ(^x*(t),t,T)=(_ξ1(^x*(t),t,T),…,_ξn(^x*(t),t,T)):∑i=1n_ξi(^x*(t),t,T)=V(N;^x*(t),t,T),_ξi(^x*(t),t,T)≥V({i};^x*(t),t,T),i∈N},

whereV({i};^x*(t),t,T)is a value of characteristic functionV(S;^x*(t),t,T)for coalitionS={i}.

ByM(^x*(t),t,T)represent any cooperative solution or subset of imputation setL(^x*(t),t,T):

M(^x*(t),t,T)⊆L(^x*(t),t,T).

In fact, the two extensively used cooperative solutions are the Shapley value and the core. In the following, we will consider a specific cooperative solution referred to as the Shapley value [38]. The Shapley value selects a single imputation, an n-vector denotedsh(·)=(s_h1(·),s_h2(·),⋯,s_hn(·)), satisfying three axioms: fairness, which means similar players are treated equally; efficiency (_∑i=1ns_hi(·)=V(N;·)); and linearity (a relatively technical axiom required to obtain uniqueness). The Shapley value is defined in a unique way and is particularly suitable for a range of applications.

The Shapley valuesh(^x*(t),t,T)=(s_h1(^x*(t),t,T),⋯,s_hn(^x*(t),t,T))∈M(^x*(t),t,T)in the game^Γc(^x*(t),t,T)is a vector, such that

s_hi(^x*(t),t,T)=∑K⊆N,i∈K(k−1)!(n−k)!n![V(K;^x*(t),t,T)−V(K\i;^x*(t),t,T)].

3. Differential Game with Continuous Updating 3.1. Preliminary Knowledge

In order to compose the corresponding differential game with continuous updating, we will apply the classic differential game with the specified duration ofT¯to the continuously updated differential game. Consider the family of gamesΓ(x,t,t+T¯)starting from the state x at an arbitrary timet>_t0. Furthermore, assume that the evolution of the state of the gameΓ(x,t,t+T¯)can be described by the ordinary differential equation

_x˙t(s)=f(s,_xt(s),u(t,s)),_xt(t)=x,

where_x˙t(s)is the derivative with respect to s,_xt∈^Rlare the state variables of a game that initials from time t, andu(t,s)=(_u1(t,s),…,_un(t,s)),_ui(t,s)∈_Ui⊂comp^Rk,s∈[t,t+T¯], indicates the control profile of the game that initials from time t at the instant time s.

For the gameΓ(x,t,t+T¯), the player’s payoff function has the following form,

_Kit(x,t,t+T¯;u(t,s))=∫tt+T¯^gi[s,_xt,u]ds,i∈N,

where_xt(s),u(t,s)are trajectory and strategy profile in the gameΓ(x,t,t+T¯).

The continuously updated differential games can be established in consonance with the following rules.

The instant timet∈[_t0,+∞)is continuously evolving, and appropriately, players continue to attain new information about the equations of motion and payment functions in the gameΓ(x,t,t+T¯).

The strategy vectoru(t)in the continuously updated differential game is as follows,

_{u(t)=u(t,s)|s=t},t∈[_t0,+∞),

whereu(t,s),s∈[t,t+T¯]are strategies in the gameΓ(x,t,t+T¯).

Determine the trajectoryx(t)in the continuously updated differential game according to

x˙(t)=f(t,x,u),x(_t0)=_x0,x∈^Rl,

whereu=u(t) are strategies in the continuously updated differential game (6), andx˙(t) is the derivative with respect to t. We assume that the strategy in the continuously updated differential game achieved using (6) is either admissible, or that the uniqueness and continuity of the solution of problem (7) can be guaranteed. The existence, uniqueness, and continuity conditions of the open-loop Nash equilibrium for the continuously updated differential game have been mentioned previously.

There is the indispensable difference between a continuously updated differential game and a classic differential game with the specified durationΓ(_x0,_t0,T). In the case of classic game, the players are conducted by payoffs that they will finally gain within the time interval[_t0,T] ; but in the game with continuous updating, they orient themselves toward the expected payoffs (5) at each time instantt∈[_t0,T], which are computed due to the information determined by the interval[t,t+T¯], or the information that they possess at the instant time t. The subgame of the initial game has the formΓ(x,t,T), by using the same method we can define a family subgames of the differential game with continuous updating at each t asΓ(_xt,s,s,t+T¯), where_xt,sis the state at the instant times∈[t,t+T¯]. We will define this next.

First, we introduce the dynamic of the state:

_x˙t(τ)=f(τ,_xt(τ),u(t,τ)),_xt(s)=_xt,s.

Therefore, the payoff function of player i in a subgame with continuous updatingΓ(_xt,s,s,t+T¯)has the form

_Kit(_xt,s,s,t+T¯;u)=∫st+T¯^gi[τ,_xt(τ),u(t,τ)]dτ,i∈N,

where_xt(τ) satisfy (8) andu(t,τ),τ∈[s,t+T¯], are strategies in the subgameΓ(_xt,s,s,t+T¯).

3.2. Cooperative Differential Game with Continuous Updating In a cooperative setting, before starting the game, all players agree to behave jointly in an optimal way (cooperate).

3.2.1. The Approach to Define the Characteristic Function on the Interval[s,t+T¯]

We introduce the notion of characteristic function_V˜t(S;x,s,t+T¯),∀S⊆Ndefined for each subgameΓ(_xt,s,s,t+T¯), whichs∈[t,t+T¯],t∈[_t0,+∞). Before introducing the characteristic function for the subgameΓ(_xt,s,s,t+T¯) , it should be mentioned that from the Equation (4), we can derive that_xt,sdepends on the initial point x. Therefore, we can replace_xt,sby x in the previous statement, such as by usingΓ(x,s,t+T¯)and_Kit(x,s,t+T¯;u)to represent the subgame and the payoff function for each player i of the subgame, respectively. Thus, the characteristic function is given:

_V˜t(S;x,s,t+T¯)=max_u1,⋯,_un_∑i=1n _Kit(x,s,t+T¯;_u1,…,_un),S=N,max_ui,i∈S∑i∈S_Kit(x,s,t+T¯;_uS,_uN\SNE),S⊂N,0,S=∅,

wherex=_xt(t),t∈[_t0,+∞),s∈[t,t+T¯] , which we have already described in (4). Moreover,_uS=_{{_ui}i∈S}is the strategy profile for the players in the coalition S.

We assume that superadditivity conditions for the characteristic function_V˜t(S;x,s,t+T¯)are satisfied:

_V˜t(_S1∪_S2;x,s,t+T¯)≥_V˜t(_S1;x,s,t+T¯)+_V˜t(_S2;x,s,t+T¯),∀_S1,_S2⊆N,_S1∩_S2=∅.

3.2.2. An Algorithm to Calculate Characteristic Function with Continuous Updating and the Shapley Value

The first three steps are to compute the necessary elements in order to define the characteristic function. In the next step, the Shapley value is computed. Step 1: Optimizing the total payment of the grand coalition with continuous updating.

We shall refer to the cooperative differential game described above by^Γc(x,t,t+T¯), the duration of the game isT¯. We believe that there are no inherent obstacles to cooperation between players, and their benefits can be transferred. More specifically, we assume that before the game actually starts, the players agree to cooperate in the game.

Definition 1.

Strategy profile^u˜*(t,s)=(_u˜1*(t,s),⋯,_u˜n*(t,s))is generalized open-loop cooperative strategies in a game with continuous updating if, for any fixedt∈[_t0,+∞)strategy profile,^u˜*(t,s)are open-loop cooperative strategies in game^Γc(x,t,t+T¯).

Using the generalized open-loop cooperative strategies, it seems possible to define the solution concept for a game model with continuous updating.

Definition 2.

Strategy profile^u*(t)=(_u1*(t),⋯,_un*(t))are called open-loop cooperative strategies in a game with continuous updating when defined in the following way,

^u*(t)=^u˜* _(t,s)|s=t,t∈[_t0,+∞),

where^u˜*(t,s)has defined the above.

We would like to interpret that the “intrinsically time-inconsistent” of players as follows:

^u*(t)in the moment t coincides with cooperative strategies in the game defined on the interval[t,t+T¯],
^u*(t+ϵ)in the instantt+ϵhas to coincide with cooperative strategies in the game defined on the interval[t+ϵ,t+ϵ+T¯].

Trajectory^x*(t)that corresponds to open-loop cooperative strategies with continuous updating^u*(t)can be obtained from the system

x˙(t)=f(t,x,^u*),x(_t0)=_x0,x∈^Rl.

Here,^x*(t)denotes a cooperative trajectory with continuous updating.

Let there exist a set of controls

^u˜*(t,s)=(_u˜1*(t,s),⋯,_u˜n*(t,s)),s∈[t,t+T¯],t∈[_t0,+∞)

such that

max_u1,⋯,_un∑i=1n_Kit(^x*(t),t,t+T¯;_u1,⋯,_un)=max_u1,⋯,_un∑i=1n∫tt+T¯^gi[s,_xt(s),u(t,s)]dss.t._xt(s)satisfies_x˙t(s)=f(s,_xt(s),u(t,s)),_xt(t)=^x*(t).

The solution_xt*(s) of the system (11) corresponding to^u˜*(t,s)is called the corresponding generalized cooperative trajectory.

Theorem 1.

Let(i)f(s,·,u(t,s))be continuously differentiable at^Rl,∀s∈[t,t+T¯],

(ii)^gi(·,·,u(t,s))be continuously differentiable onR×^Rl,∀s∈[t,t+T¯].

A set of strategies,_{{_u˜i*(t,s)}i∈N} , provides generalized open-loop cooperative strategies in a differential game with continuous updating to the problem in (11), if for any fixedt∈[_t0,T], there exists a costate variable^ψt(s)withs∈[t,t+T¯]so that the following relations are satisfied:

(1)_x˙t*(s)=f(s,_xt*,^u˜*),_xt*(t)=^x*(t),for∀s∈[t,t+T¯],

(2)_u˜i*(t,s)=argmax_ui∈_Ui,i∈N^Ht(s,_xt*(s),u(t,s),^ψt(s)), wheres∈[t,t+T¯],i∈N,

(3)^ψ˙t(s)=−∂∂_xt^Ht(s,_xt(s),^u˜*(t,s),^ψt(s)), wheres∈[t,t+T¯],

^ψt(t+T¯)=0.

Remark 1.

Let fixt≥_t0and consider game^Γc(x,t,t+T¯).

The motion equation is in the form

_x˙t(s)=f(s,_xt(s),u(t,s)),_xt(t)=^x*(t),s∈[t,t+T¯].

The payoff function of the grand coalition has the form

∑i=1n_Kit(^x*(t),t,t+T¯;_u1(t,s),⋯,_un(t,s))=∑i=1n_∫tt+T¯ ^gi[s,_xt(s),u(t,s)]ds.

For the optimization problem (12) and (13) Hamiltonian has the form

^Ht(s,_xt(s),u(t,s),^ψt(s))=∑i=1n^gi[s,_xt(s),u(t,s)]+^ψt(s)f(s,_xt(s),u(t,s)),s∈[t,t+T¯].

If_u˜i*(t,s),i∈Nare the generalized open-loop cooperative strategies in the differential game with continuous updating; then, as stated in Definition 1, for every fixedt≥_t0,_u˜i*(t,s),i∈Nis an open-loop cooperative strategy in game^Γc(^x*(t),t,t+T¯). Therefore, for any fixedt≥_t0 , the conditions (1)–(3) of the theorem are satisfied as necessary conditions for cooperative strategies in open-loop strategies (see in [39]).

On the other hand, if for everyt≥_t0, the Hamiltonian^Htare concave in(_xt,u(t,s)) , then the conditions of the theorem are sufficient for a cooperative open-loop solution [40].

Then, for any fixedt∈[_t0,+∞), we can obtain the generalized open-loop cooperative strategy^u˜*(t,s)=(_u˜i*(t,s),⋯,_u˜n*(t,s))and its corresponding state trajectory_xt*(s),s∈[t,t+T¯]. Using Definitions 1 and 2, we can obtain the cooperative strategies with continuous updating_{{_ui*(t)}i∈N}and corresponding cooperative trajectory with continuous updating^x*(t),t∈[_t0,+∞).

In order to get the characteristic function of the grand coalition in subgameΓ(^x*(t),s,t+T¯), substituting^u˜*(t,τ)and_xt*(τ)into the corresponding payoff function, denote_V˜t(N;^x*(t),s,t+T¯)as the function of the coalition N. The current-value maximized cooperative payoff_V˜t(N;^x*(t),s,t+T¯)can be expressed as

_V˜t(N;^x*(t),s,t+T¯)=∑i=1n∫st+T¯^gi[τ,_xt*(τ),^u˜*(t,τ)]dτ.

Step 2: Computation of the generalized open-loop Nash equilibrium with continuous updating.

The problem of a non-cooperative subgame along the cooperative trajectory with continuous updatingΓ(^x*(t),s,t+T¯)can be stated as follows,

max_ui∈_Ui_Kit(^x*(t),s,t+T¯;_ui(t,s),_u˜−iNE(t,s))=max_ui∈_Ui∫st+T¯^gi[τ,_xt(τ),_ui(t,τ),_u˜−iNE(t,τ)]dτs.t._xt(τ)satisfies(8)

where_u˜−iNE(t,τ)=(_u˜1NE(t,τ),…,_u˜i−1NE(t,τ),_u˜i+1NE(t,τ),…,_u˜nNE(t,τ)).

In this setting, the current-value Hamiltonian function can be written as

_Hit(τ,_xt(τ),u(t,τ),_ψit(τ))=^gi(τ,_xt,u)+_ψit(τ)f(τ,_xt,u),τ∈[s,t+T¯],i∈N,

whereτ∈[s,t+T¯],t∈[_t0,+∞) . By using the Pontryagin maximum principle with continuous updating [12], we can get the open-loop Nash equilibrium_{{_u˜iNE(t,τ)}i∈N}, and the corresponding trajectory_xtNE(τ),∀τ∈[s,t+T¯],t∈[_t0,+∞). It is then easy to derive the characteristic function of a single-player coalition as follows, for eachi=1,2,⋯,n

_V˜t({i};^x*(t),s,t+T¯)=∫st+T¯^gi[τ,_xtNE(τ),^u˜NE(t,τ)]dτ,∀s∈[t,t+T¯],t∈[_t0,+∞).

Step 3: Compute the characteristic function for all remaining possible coalitions with continuous updating.

Here, we need to compute only the coalitions that contain more than one player and exclude a grand coalition. There will be²ⁿ−n−2subsets obtained in the following way. We will apply theδ-characteristic function so that players of S maximize their total payoff∑i∈S_Kit(^x*(t),s,t+T¯;_uS,_u˜N\SNE)along the cooperative strategy with continuous updating^x*(t), while the other players, those fromN\S, use generalized open-loop Nash-equilibrium strategies

_u˜N\SNE=_{{_u˜jNE}j∈N\S}.

Thus, we have a two-stage construction procedure for the characteristic function: (1) Find generalized open-loop Nash equilibrium strategies_u˜iNE(t,τ)for all playersi∈N, which we have found in the Step 2; (2) “Freeze” the Nash equilibrium strategies_u˜jNE(t,τ)for players fromN\S, and, as for the player from the coalition S, maximize their total payoff over_uS=_{{_ui}i∈S}. In order to compute the value function of the subgameΓ(^x*(t),s,t+T¯),∀t∈[_t0,+∞),s∈[t,t+T¯], we present the following concept.

Definition 3.

A set of strategies_u˜S*=_{{_u˜i*(t,τ)}i∈S},τ∈[s,t+T¯], provides a generalized open-loop optimal strategy for coalitionS⊂Nin a subgame with continuous updatingΓ(^x*(t),s,t+T¯)when it is the solution obtained by using the Pontryagin maximum principle of the following problem

max_uS∈_US∑i∈S_Kit(^x*(t),s,t+T¯;_uS,_u˜N\SNE)=max_uS∈_US∑i∈S∫st+T¯^gi[τ,_xtS(τ),_uS(t,τ),_u˜N\SNE(t,τ)]dτs.t._xtS(τ)=f(τ,_xtS(τ),_uS(t,τ),_u˜N\SNE(t,τ)),_xtS(s)=_xt,s.

The Hamiltonian function of the problem (14) has the form,∀S(The uppercase letter “S” in the paper always denotes the coalition S, e.g.,_xtS,_ψSt, and_uS)⊂N:

_HSt(τ,_xtS(τ),_uS(t,τ),_u˜N\SNE(t,τ),_ψSt)=∑i∈S^gi(τ,_xtS,_uS,_u˜N\SNE)+_ψSt(τ)f(τ,_xtS,_uS,_u˜N\SNE).

Theorem 2.Let

(i)f(τ,·,u(t,τ))be continuously differentiable on^Rl,∀τ∈[s,t+T¯],

(ii)^gi(·,·,u(t,τ))be continuously differentiable onR×^Rl.

A set of strategies,_u˜S*=_{{_u˜i*(t,τ)}i∈S}, provides a generalized open-loop optimal strategies of the coalition S in subgame with continuous updatingΓ(^x*(t),s,t+T¯) to the problem (14) if there exists²ⁿ−n−2costate functions_ψSt(τ), whereτ∈[s,t+T¯],S⊂N, so that, for∀s∈[t,t+T¯],t∈[_t0,+∞), the following relations are satisfied:

(1)_x˙tS(τ)=f(τ,_xtS,_u˜S*(t,τ),_u˜N\SNE),_xtS(s)=_xt,s, for∀τ∈[s,t+T¯],

(2)_u˜S*(t,τ)=argmax_uS∈_US_HSt(_xtS(τ),_uS(t,τ),_u˜N\SNE(t,τ),_ψSt(τ)), for∀τ∈[s,t+T¯], where_u˜S*(t,τ)=_{{_u˜iS(t,τ)}i∈S}

(3)_ψ˙St(τ)=−∂∂_xt_HSt(τ,_xt(τ),_u˜S*(t,τ),_u˜N\SNE(t,τ),_ψSt(τ)), whereτ∈[s,t+T¯],S⊂N_ψSt(t+T¯)=0,S⊂N.

Proof. Follow the proof of Theorem 1. □

Therefore, the agents in the coalition S will adopt the generalized open-loop optimal control_u˜S*(t,τ)characterized in Theorem 2. Note that these controls are functions of fixed timet∈[_t0,+∞)and instant timeτ∈[s,t+T¯].

An illustration of the characteristic function for the coalitionS⊆Nis provided in the following way,

_V˜t(S;^x*(t),s,t+T¯)=∑i∈S∫st+T¯^gi[τ,_xtS(τ),_u˜S*(t,τ),_u˜N\SNE(t,τ)]dτ,∀s∈[t,t+T¯],t∈[_t0,+∞),

where_xtS(τ)is the trajectory at time instantτ∈[s,t+T¯]when the players in coalition S use generalized open-loop optimal strategies_u˜S*(t,τ), while players inN\Suse generalized open-loop Nash equilibrium_u˜N\SNE(t,τ)that was already derived in Step 2.

For the characteristic function in the game model with continuous updating, first, suppose that the function_V˜t(S;^x*(t),s,t+T¯),∀S⊆Ncan be continuously differentiated bys∈[t,t+T¯]. Moreover, throught∈[_t0,+∞)can be integrated, the characteristic function in the game model with continuous updatingV(S;^x*(t),t,T)is defined as follows.

Definition 4.

FunctionV(S;^x*(t),t,T),t∈[_t0,T],S⊆Nis a characteristic function of the differential game with continuous updatingΓ(^x*(t),t,T), if it is defined as the following integral,

V(S;^x*(t),t,T)=_∫tT−dds_V˜^τ′(S;^x*(^τ′),s,^τ′+T¯)_|s=^τ′d^τ′,t∈[_t0,T],S⊆N,

where_V˜^τ′(S;^x*(^τ′),s,^τ′+T¯),s∈[^τ′,^τ′+T¯],^τ′∈[t,T],S⊆Ndefined on the interval[s,^τ′+T¯]is a characteristic function in gameΓ(^x*(^τ′),s,^τ′+T¯).

In (15), we assume that the intergal is taken with a finite time interval because in this case we can only claim that the values of the characteristic function with continuous updating are finite. Later on in the example model, we shall calculate the characteristic function and Shapley value using the final interval method. We assume that superadditivity conditions are satisfied:

V(_S1∪_S2;^x*(t),t,T)≥V(_S1;^x*(t),t,T)+V(_S2;^x*(t),t,T),∀_S1,_S2⊆N,_S1∩_S2=∅.

Step 4: Compute the Shapley value based on the characteristic function with continuous updating.

Consider again the cooperative game model^Γc(^x*(t),t,T)with continuous updating. If the players are allowed to form different coalitions consisting of a subset of all playersK⊆N. There are k players in the subset K. An imputation set of cooperative game^Γc(^x*(t),t,T)is the setL(^x*(t),t,T)={ξ(^x*(t),t,T)=(_ξ1(^x*(t),t,T),⋯,_ξn(^x*(t),t,T))}, which satisfies the conditions

_ξi(^x*(t),t,T)≥V({i};^x*(t),t,T),i∈N;∑i∈N_ξi(^x*(t),t,T)=V(N;^x*(t),t,T)}.

A cooperative solution or the optimal principle is a non-empty subset of the imputation setL(^x*(t),t,T). In particular, the Shapley valuesh(^x*(t),t,T)=(s_h1(^x*(t),t,T),⋯,s_hn(^x*(t),t,T))is an imputation whose components are defined as

s_hi(^x*(t),t,T)=∑K⊆N,i∈K(k−1)!(n−k)!n![V(K;^x*(t),t,T)−V(K\i;^x*(t),t,T)],

whereK\iis the relative complement of i in K, the notionV(K;^x*(t),t,T)is defined by Definition 4 and is the profit of coalition K. Meanwhile,[V(K;^x*(t),t,T)−V(K\i;^x*(t),t,T)]is the marginal contribution of player i to coalition K.

There are many other cooperative optimality principles, for example, the von Neumann–Morgenstern solution, N-core, and nucleus. In all cases they involve some subsets of the game imputation set. 4. A Cooperative Differential Game for Pollution Control

Let us consider the following game proposed by Long [41]. When countries are indexed byi∈N, we denote thatn=|N|. It is assumed that each player has an industrial production site and the production is proportional to the pollutant_ui. Therefore, the player’s strategy is to decide the amount of pollutants emitted into the atmosphere.

4.1. Initial Game Model

Pollution accumulates over time. We denote byx(t)the stock of pollution at time t and assume that the countries “contribute” to the same stock of pollution. For simplicity, the evolution of stockx(t)is represented by the following linear equation:

x˙(t)=∑i=1n_ui(t)−δx(t),x(_t0)=_x0,

whereδis a constant rate of decay, in other words, the absorption rate of pollution by nature.

In the following, we assume that the absorption coefficientδis equal to zero:

x˙(t)=∑i=1n_ui(t),x(_t0)=_x0.

Pollution is a “public bad” because it exerts adverse affects on health, quality of life, and productivity. We assume that these adverse effects can be represented by having x as an argument of the instantaneous social welfare function_Fi, with negative derivative:

_Fi=_Fi(x,t,_ui),∂_Fi∂x<0.

In each country, aggregate social welfare is taken to be the integral of the instantaneous social welfare. Thus, the payoff of the player i can be formulated as follows,

_Ki(_x0,_t0,T;u)=_{∫_t0T} _Fi(x,t,_ui)dt.

For tractability, the function_Fiis often assumed to take the separable form:

_Fi(x,t,_ui(t))=_Ri(_ui(t))−_Di(x),

where_Ri(_ui)may be thought of as the utility of the benefit, and_Di(x)as the “disutility” caused by pollution. Following standard practice, we take it that_Ri(_ui)is strictly concave and increasing in_ui, and that_Di(x)is convex and increasing in x. The possibility that_Diis linear is not ruled out.

We assume that the environmental damage cost of player i caused by the pollution stock is_Di(x)=_dixand the damage cost_Di(x)increases convexly. In the environmental economics literature, the typical assumption is that the production income function of player i can be expressed as a function of emissions, namely,_Ri(_ui(t))=_bi _ui−12_ui2, satisfying_Ri(0)=0, where_biand_diare positive parameters. For the above benefit function to have a concave increase in emissions, we impose the restriction_ui(t)∈(0,_bi).

Suppose that the game is played in a cooperative scenario in which players have the opportunity to cooperate in order to achieve maximum total payoff:

max_u1,_u2,...,_un∑i=1n_Ki(_x0,_t0,T;u)=∑i=1n_{∫_t0T}((_bi−12_u˜i)_u˜i−_dix)dt

To solve the optimization problem in (19) and (18), we invoke the Pontryagin maximum principle to characterize the solution as follows. Obviously, these are linear state games (These are games for which the system dynamics and the utility functions are polynomials of degree 1 with respect to the state variables and which satisfy a certain property (described below) concerning the interaction between control variables and state variables. We call this class of games linear state games.). This shows that these games have the property that their open-loop Nash equilibrium are Markov perfect. The class of linear state games has a very useful property. The linearity in the state variables together with the decoupled structure between the state variables and the control variables implies that the open-loop equilibrium is Markov perfect and that the value functions are linear in the state variables.

It is obvious to demonstrate that the optimal emissions control of player i for an initial differential game model is given by

_u˜i(t)=_bi−∑i=1n_di(T−t),i∈N.

To obtain the cooperative state trajectory for the initial differential game, it suffices to insert_u˜i(t) in (20) into the dynamics and to solve the differential equation to get

^x˜*(t)=_x0+(∑i=1n_bi−n∑i=1n_diT)(t−_t0)+n∑i=1n_di^t2−_t022.

4.2. A Pollution Control Game Model with Continuous Updating

In the gameΓ(x,t,t+T¯), the dynamics of the total amount of pollution_xt(s)is described by

_xt˙(s)=∑i=1n_ui(t,s),_xt(t)=x,

in which we assume that the absorption coefficient corresponding to the natural purification of the atmosphere is equal to zero.

The instantaneous payoff ofi-thplayer is defined as

_Ri(_ui(t,s))=_bi _ui(t,s)−12_ui2(t,s),i∈N.

Due to decontamination, each player is compelled to bear the cost. Therefore, the instantaneous utility of thei-thplayer is equal to_Ri(_ui(t,s))−_di _xt(s), where_di>0.

Thus the payoff of the player i is defined as

_Kit(x,t,t+T¯;u)=_∫tt+T¯((_bi−12_ui)_ui−_di _xt)ds,

where_ui=_ui(t,s)is the control of the player i at the instant times∈[t,t+T¯],_xt=_xt(s)is the pollution accumulation at the same time s.

Therefore, the payoff function of the player i in the subgame with continuous updatingΓ(x,s,t+T¯)is given by

_Kit(x,s,t+T¯;u)=∫st+T¯((_bi−12_ui(t,τ))_ui(t,τ)−_di _xt(τ))dτ,i∈N,

where_xt(τ),u(t,τ), andτ∈[s,t+T¯]are both the trajectory and strategies in gameΓ(x,s,t+T¯). The dynamics of the state is given by

_x˙t(τ)=∑i=1n_ui(t,τ),_xt(s)=_xt,s.

Step 1: Optimizing the total payment of the grand coalition with continuous updating.

Consider the game in a cooperative form. This means that all players will work together to maximize their total payoff. We seek the optimal profile of strategies^u˜*(t,s)=(_u˜1*(t,s),...,_u˜n*(t,s))such that_∑i=1n _Kit→max_u1,_u2,...,_un.

The optimization problem is as follows,

∑i=1n_Kit(x,t,t+T¯;u)=∑i=1n_∫tt+T¯((_bi−12_ui(t,s))_ui(t,s)−_di _xt(s))ds→max_u1,_u2,...,_uns.t._xt(s)satisfies(22).

In order to deal with the problem (24), we use the classical Pontryagin maximum principle. The corresponding Hamiltonian is

^Ht(s,_xt(s),u(t,s),^ψt(s))=∑i=1n(_bi−12_ui)_ui−∑i=1n_di _xt+^ψt(s)(_u1+_u2+...+_un).

The first order partial derivatives w.r.t._ui’s are

∂^Ht∂_ui(s,_xt,u,^ψt)=_bi−_ui+^ψt=0,

and the Hessian matrix^∂2 ^Ht∂^u2(s,_xt,u,^ψt)is negative definite, all at once, we can conclude that the Hamiltonian^Htis concave w.r.t._ui. Here, we obtain the cooperative strategies:

_u˜i*(t,s)=_bi+^ψt(s)

Considering the Pontryagin’s maximum principle, when dealing with the costate variable

^ψ˙t(s)=∑i=1n_di,^ψt(t+T¯)=0

if we set_∑i=1n _di=_dN, so, we can get^ψt(s)=−_dN(t+T¯−s). Finally, the form of the cooperative strategies is

_u˜i*(t,s)=_bi−_dN(t+T¯−s)

and from (22) we get the optimal (cooperative) trajectory:

_xt*(s)=x+_bN(s−t)−n_dN(t+T¯)(s−t)+n_dN^s2−^t22,

wherex=_xt(t),_dN=_∑i=1n _di,_bN=_∑i=1n _bi.

According to the procedure (10), we construct open-loop optimal cooperative strategies with continuous updating:

_ui*(t)=_u˜i* _(t,s)|s=t=_bi−_dNT¯.

After substituting_ui*(t) into the differential Equation (18), we can arrive at the optimal cooperative trajectory^x*(t)with continuous updating:

^x*(t)=_x0+_bN(t−_t0)−n_dNT¯(t−_t0).

The results of the comparison of the cooperative strategies, corresponding trajectories between initial differential game model and the differential game with continuous updating obtained are graphically shown in Figure 1 and Figure 2.

From Figure 1 we can see that the optimal control with continuous updating is more stable than the optimal control in the initial game model. We can also see that, from the timet=4, the optimal control in the initial game is greater than it with continuous updating, which means players should increase the pollution emissions into the atmosphere in the initial differential game model, a harmful result. This occurs because in the initial game model, players have the whole information of the game on the interval[_t0,T], players are more cautious, and they dare not emit too much pollution at first. However, in real life, it is impossible to have the information for the whole time interval. Therefore, we consider the game with continuous updating, at each time instant t, players have the information only on[t,t+T¯]. In the case of continuous updating, the players are brave enough to emit more pollution because of lacking the information for the whole game.

We can see from Figure 2 that, starting fromt=0tot=8, the pollution accumulation in the initial game model is less than the model with continuous updating. Because in the initial game model, players are more knowledgeable, they know the information from the whole time interval, which leads to lower pollution accumulation because the players are cautious. Starting from timet=8, pollution with continuous updating is less than pollution in the initial game because the knowledge for players in the model with continuous updating is close to the initial game model as time goes on. Using the continuous updating method can help us to make our modeling more consistent with the actual situation.

Next, for a given subgameΓ(^x*(t),s,t+T¯)of a differential game with continuous updatingΓ(^x*(t),t,t+T¯), the characteristic function for the grand coalition N is given by_V˜t(N;^x*(t),s,t+T¯), which can be represented as

_V˜t(N;^x*(t),s,t+T¯)=∑i=1n_∫st+T¯((_bi−12_u˜i*(t,τ))_u˜i*−_di _xt*(τ))dτ,

where_xt*(τ) satisfies (26) withx=^x*(t),_u˜i*(t,τ) satisfies (25). Therefore, we can get the value function of the grand coalition N

_V˜t(N;^x*(t),s,t+T¯)=(t+T¯−s)[12_b˜N−_dN ^x*(t)+_dN _bN2(t−T¯−s)−13n_dN2(^(t+T¯−s)2−32^T¯2)],

where_dN=_∑i=1n _di,_bN=_∑i=1n _bihas been defined above and_b˜N=_∑i=1n _bi2 . Note that in (29),^x*(t)represents the cooperative pollution with continuous updating at the time t.

For our problem, we can also use the dynamic programming method based on the Hamilton–Jacobi–Bellman equation. It is straightforward to verify the Bellman function of the form_V˜t=A(t,s)_xt(s)+B(t,s), and we get the same result as Pontryagin maximum principle.

Step 2: The computation of the generalized open-loop Nash equilibrium with continuous updating.

The Hamiltonian for each playeri=1,2,⋯,nis

_Hit(τ,_xt(τ),u(t,τ),_ψit(τ))=(_bi−12_ui)_ui−∑i=1n_di _xt+_ψit(τ)(_u1+_u2+⋯+_un)

its first-order partial derivatives w.r.t._ui’s are

∂_Hit∂_ui(τ,_xt(τ),u(t,τ),_ψit(τ))=_bi−_ui+_ψit=0,

and the Hessian matrix^∂2 _Hit∂_ui2(τ,_xt,u,_ψit)is the negative definite whence we conclude that the Hamiltonian_Hitis concave w.r.t._ui. We obtain optimal controls

_u˜iNE(t,τ)=_bi−_di(t+T¯−τ),i=1,2,⋯,n.

As for the subgame start at time instants∈[t,t+T¯], we can easily derive the corresponding trajectory (for the Nash equilibrium case) of subgameΓ(^x*(t),s,t+T¯)along the cooperative trajectory, in other words_xt,s=_xt*(s)is

_xtNE(τ)=_xt*(s)+_bN(τ−s)+_dN(^{(t+T¯−τ)2}2−^(t+T¯−s)22)=^x*(t)+_bN(τ−t)−n_dN(t+T¯)(s−t)+n_dN^s2−^t22+_dN2(^{(t+T¯−τ)2}−^(t+T¯−s)2).

The maximum of the payoff for each playeri=1,2,⋯,nin the subgame starting from the time instant s and the state^x*(t)has the form

_V˜t({i};^x*(t),s,t+T¯)=_∫st+T¯((_bi−12_u˜iNE(t,τ))_u˜iNE−_di _xtNE(τ))dτ=(t+T¯−s)[_bi22−_di ^x*(t)+n_di _dN2(s−t)(t−s+2T¯)−16_di2 ^(t+T¯−s)2+13_di _dN ^(t+T¯−s)2−12_di _bN(s+T¯−t)].

Step 3: The computation of the characteristic function for all remaining possible coalitions in differential games with continuous updating. It is possible to calculate the controls and the corresponding value functions for different coalitions. Nonetheless, the form will depend on how we define their respective optimal control problems.

Let us build up the characteristic function hinged on the approach ofδ—c.f. The characteristic function of coalition S is calculated in two stages: the first stage is already done (we have already found the Nash equilibrium strategies for each player in Step 2); at the second stage, it is assumed that the remaining playersj∈N\Scarry out their Nash optimal strategies_u˜jNE(t,τ)although the players from coalition S explore to make their joint payoff∑i∈S_Kimaximal. Consider the case of S-coalition. It seems constructive to perform calculations in detail. The respective Hamiltonian for the coalition S is

_HSt(τ,_xt(τ),u(t,τ),_ψSt(τ))=∑i∈S((_bi−12_ui)_ui)−_dS _xt+_ψSt(∑i∈S_ui+∑j∈N/S_u˜jNE),

where_dS=∑i∈S_di. Note that we substituted_uj,j∈N/Sby_u˜jNEwhich was found earlier.

The optimal strategies of players in coalition S are_u˜S*(t,τ), which satisfies

_u˜i*(t,τ)=_bi+_ψSt(τ),i∈S.

The differential equation for_ψSt(τ)is_ψ˙St(τ)=_dSwhich is solved to ensure that_ψSt(t+T¯)=0. Eventually, we get_ψSt(τ)=−_dS(t+T¯−τ). We substitute the obtained expression for_ψSt(τ)into_u˜i*, and then get

_u˜i*(t,τ)=_bi−_dS(t+T¯−τ),i∈S

where_dS=∑i∈S_di. We see that the player out of coalition S implements their optimal strategy_u˜S*while the left out players adhere to their Nash equilibrium.

In the next step, we integrate (23) start from the point_xt*(s)to get_xtS(τ):

_xtS(τ)=_xt*(s)+_bN(τ−s)+^{(t+T¯−τ)2}2((k−1)_dS+_dN)−^(t+T¯−s)22((k−1)_dS+_dN).

_xtS(τ)is the trajectory in the subgameΓ(^x*(t),s,t+T¯)of the gameΓ(^x*(t),t,t+T¯)starting at time instant s at_xt*(s), when players from coalition S use strategies_u˜S*(t,τ), and players from coalitionN\Suse^u˜NE(t,τ). If we consider the case taken along the cooperative trajectory, then we can substitute the_xt*(s) we have already obtained in (26) to get the state variable_xtS(τ)that depends onx=^x*(t).

_xtS(τ)=^x*(t)+_bN(τ−t)−n_dN(t+T¯)(s−t)+n_dN^s2−^t22+^{(t+T¯−τ)2}2((k−1)_dS+_dN)−^(t+T¯−s)22((k−1)_dS+_dN).

The respective value of the characteristic function_V˜t(S;^x*(t),s,t+T¯)is

_V˜t(S;^x*(t),s,t+T¯)=(t+T¯−s)[12_b˜S−_dS ^x*(t)+_dS _dNn2(t+2T¯−s)(s−t)+^(t+T¯−s)2(k−26_dS2+_dS _dN3)−_dS _bN2(s+T¯−t)].

According to Definition 4, the characteristic function of a differential game with continuous updating has the following form,

V(S;^x*(t),t,T)=∫tT−d_V˜^τ′(S;^x*(^τ′),s,^τ′+T¯)ds_|s=^τ′d^τ′=(T−t)[12_b˜S−_dS _x0+^T¯2(k−22_dS2+(1−n)_dS _dN)−_dS(_bN−n_dNT¯)2(T+t−2_t0)]

Check the superadditivity condition (16) for constructed characteristic function

V(S;^x*(t),t,T). It turns out that for anyS,P⊆NandS∩P=∅, let|S|=k≥1,|P|=m≥1the following holds,

V(S∪P;^x*(t),t,T)−V(S;^x*(t),t,T)−V(P;^x*(t),t,T)=^T¯2[(k+m−2)_dS _dP+k2_dP2+m2_dS2]≥0.

Thus, theδ-characteristic functionV(S;^x*(t),t,T)is a superadditive function without any additional conditions applied to the parameters of the model.

In the following figure, we will compare the characteristic function for the grand coalition N between the initial game model and a differential game with continuous updating.

Figure 3 demonstrates the reason accounting for why the value of a characteristic function in the initial model is greater than that of continuous updating is that the complexity of the information within a continuous updating setting can reduce the effectiveness of the coalition. It should be noted that the continuous updating case is more realistic. We can conclude that the payoff of the coalition decreases because, as time goes on, pollution accumulates in the air. The player’s payoff depends on levels of pollution and payoff decreases as pollution increases. It should also be noted that the coalition’s effectiveness decreases in the initial game model at a faster rate than it does with continuous updating.

Step 4: Compute the Shapley value based on the characteristic function with continuous updating.

Any of the known principles of optimality can be applied to find a cooperative solution. First of all, the notion∑_dj _dl(Here we should note that_dk _dj=_dj _dk. ) represents the interaction of cost among players. Now, consider the cooperative solution of a differential game with continuous updating. According to procedure (17), we construct the Shapley value for anyi∈Nwith continuous updating using the characteristic function with continuous updating of the auxiliary subgame and get

s_hi(^x*(t),t,T)=∑K⊆N,i∈K(k−1)!(n−k)!n![V(K;^x*(t),t,T)−V(K\i;^x*(t),t,T)]=(T−t)[−_di _x0+12_bi2+n_di _dNT¯−_di _bN2(T+t−2_t0)+^T¯2(1−2n3_di _dN−4+n12_di2+13(∑j,l≠ij≠l∈N_dj _dl)+14_d˜N)].

The graphic representation of the Shapley value for subgames with continuous updating and the initial game model along the optimal cooperative trajectory^x*(t) is demonstrated in Figure 4.

Figure 4 shows that if we consider the problem in a more realistic case (continuous updating), a player with continuous updating can get less allocation from the coalition than they get from the initial game model. This is based on the fact that, at an early stage, the pollution emitted into the atmosphere is more than in the initial game model, and in the latter stage the pollution with continuous updating is less than in the initial game model. Thus, starting witht=0, players get more in the initial game model, but in the same period, they all get 0 in the end. This shows that as pollution intensifies the benefits countries receive from its attendant production gradually decrease.

5. Conclusions

In this paper, we presented the detailed consideration of a cooperative differential game model with continuous updating based on Pontryagin maximum principle, where the decision-maker updates his/her behavior based on the new information available which arises from a shifting time horizon. The characteristic function with continuous updating obtained by using the Pontryagin maximum principle for the cooperative case is constructed. The results show that theδ-characteristic function computed for the game is superadditive and does not have any other restrictions on the model’s parameters. The concept of the Shapley value as a cooperative solution with continuous updating is demonstrated in an analytic form for pollution control problems. Ultimately, considering the example of n-player pollution control, optimal strategies, the corresponding trajectory, the characteristic function, and the Shapley value with continuous updating are conceived for the proposed application and graphically compared for their effectiveness. We showed simulation results that show the applicability of the approach.

The practical significance of the work is determined by the fact that the real life conflict controlled processes evolve continuously in time and the players usually are not or cannot use full information about it. Therefore, it is important to introduce the type of differential games with information updating to the field of game theory. Another important practical contribution of the continuous updating approach is the creation of a class of inverse optimal control problems with continuous updating [17]. Problems that can be used to analyze a profile of the human in the human-machine type of engineering systems. The results are illustrated on the model of a driver assistance system and are applied to the real driving data from the simulator located in the Institute of Control Systems, Karlsruhe Institute of Technology. Our method can provide more in-depth modeling of human engineering systems.

Author Contributions

Conceptualization, J.Z.; Data curation, A.T.; Formal analysis, A.T.; Funding acquisition, O.P.; Investigation, O.P.; Methodology, J.Z.; Project administration, H.G.; Resources, H.G.; Software, J.Z.; Supervision, O.P. and H.G.; Validation, A.T.; Visualization, H.G.; Writing-original draft, J.Z.; Writing-review and editing, O.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Postdoctoral International Exchange Program of China and funded by the Russian Foundation for Basic Research (RFBR) according to the Grant No. 18-00-00727 (18-00-00725), and the National Natural Science Foundation of China (Grant No. 71571108).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The corresponding author would like to acknowledge the support from the China-Russia Operations Research and Management Cooperation Research Center, that is an association between Qingdao University and St. Petersburg State University.

Conflicts of Interest

The authors declare no conflict of interest.

Word count: 7135

Show less

© 2021. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

We consider a class of cooperative differential games with continuous updating making use of the Pontryagin maximum principle. It is assumed that at each moment, players have or use information about the game structure defined in a closed time interval of a fixed duration. Over time, information about the game structure will be updated. The subject of the current paper is to construct players’ cooperative strategies, their cooperative trajectory, the characteristic function, and the cooperative solution for this class of differential games with continuous updating, particularly by using Pontryagin’s maximum principle as the optimality conditions. In order to demonstrate this method’s novelty, we propose to compare cooperative strategies, trajectories, characteristic functions, and corresponding Shapley values for a classic (initial) differential game and a differential game with continuous updating. Our approach provides a means of more profound modeling of conflict controlled processes. In a particular example, we demonstrate that players’ behavior is braver at the beginning of the game with continuous updating because they lack the information for the whole game, and they are “intrinsically time-inconsistent”. In contrast, in the initial model, the players are more cautious, which implies they dare not emit too much pollution at first.

Details

Title

Transferable Utility Cooperative Differential Games with Continuous Updating Using Pontryagin Maximum Principle

First page

163

Publication year

2021

Publication date

2021

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math9020163

ProQuest document ID

2478692689

Transferable Utility Cooperative Differential Games with Continuous Updating Using Pontryagin Maximum Principle

Jump to:

Full text

Abstract

Details

Suggested sources