1. Introduction
Many applications in science and engineering have shown a huge interest in solving an inverse problem of findingx∈Rnsatisfying
Bx=b,
whereb∈Rris the observed data andBr×nis the corresponding nonzero matrix. Actually, the inverse problem is typically facing the ill-condition of the matrix B so that it may have no solution. Then, an approach for finding an approximate solution by minimizing the squared norm of the residual term has been considered:
minimize∥Bx−b∥2subjecttox∈Rn.
Observe that the problem (2) has several optimal solutions; in this situation, it is not clear which of these solutions should be considered. One strategy for pursuing the best optimal solution among these many solutions is to add a regularization term to the objective function. The classical technique is to consider the celebrated Tikhonov regularization [1] of the form
minimize∥Bx−b∥2+λ∥x∥2subjecttox∈Rn,
whereλ>0 is a regularization parameter. In this setting, the uniqueness of the solution to (3) is acquired. However, note that from a practical point of view, the shortcoming of this strategy is that the unique solution to the regularization problem (3) may probably not optimal in the original sense as in (2), see [2,3] for further discussions.
To over come this, we should consider the strategy of selecting a specific solution among optimal solutions to (2) by minimizing an additional prior function over these optimal solutions. This brings the framework of the following bilevel optimization problem,
minimizef(x)+h(Ax)subjecttox∈argminu∈Rn∥Bu−b∥2,
wheref:Rn→Randh:Rm→Rare convex functions, andA:Rn→Rmis a nonzero linear transformation. It is very important to point out that many problems can be formulated into this form. For instance, ifm:=n−1,f:=∥·∥1,h:=∥·∥1, and
A:=−110⋯000−11⋯00⋯⋯⋯⋯⋯⋯000⋯−11∈R(n−1)×n
the problem (4) becomes the fused lasso [4] solution to the problem (2). This situation also occurs in image denoising problems (r=nand B is the identity matrix), and in image inpainting problems (r=nand B is a symmetric diagonal matrix), where the term∥Ax∥1 is known as an 1D total variation [5]. Whenm=n,f=∥·∥2,h=∥·∥1 , and A is the identity matrix, the problem (4) becomes the elastic net [6] solution to the problem (2). Moreover, in wavelet-based image restoration problems, the matrix A is given by an inverse wavelet transform [7].
Let us consider the constrained set of (4), it is known that introducing the Landweber operatorT:Rn→Rnof the form
Tx:=x−1∥B∥2B⊤(Bx−b),
yields T is firmly nonexpansive and the set of all fixed points of T is nothing else that the setargminx∈Rn ∥Bu−b∥2 , see [8] for more details. Motivated by this observation, the bilevel problem (4) can be considered in the general setting as
minimizef(x)+h(Ax)subjecttox∈FixT,
whereT:Rn→Rnis a nonlinear operator withFixT:={x∈Rn:Tx=x}≠∅ . Note that problem (5) encompasses not only the problem (4), but also many problems in the literature, for instance, the minimization over the intersection of a finite number of sublevel sets of convex nonsmooth functions (see Section 5.2), the minimization over the intersection of many convex sets in which the metric projection on such intersection can not be computed explicitly, see [9,10,11] for more details.
There are some existing methods for solving convex optimization problems over fixed-point set in the form of (5), but the celebrated one is due to the hybrid steepest descent method, which was firstly investigated in [12]. Note that the algorithm proposed by Yamada [12] goes on with the hypotheses that the objective functions are strongly convex and smooth and the operator T is nonexpansive. Several variants and generalizations of this well-known method are, for instance, Yamada and Ogura [11] considered the same scheme for solving the problem (5) when T belongs to a class of so called quasi-shrinking operator. Cegielski [10] proposed a generalized hybrid steepest descent method by using the sequence of quasi-nonexpansive operators. Iiduka [13,14] considered a nonsmooth convex optimization problem (5) with fixed-point constraints of certain quasi-nonexpansive operators.
On the other hand, in the recent decade, the split common fixed point problem [15,16] turns out to be one of the attractions among several nonlinear problems due to its widely applications in many image and signal processing problems. Actually, for given a nonzero linear transformationA:Rn→Rm, and two nonlinear operatorsT:Rn→RnandS:Rm→RmwithFix(T)≠∅, andR(A)∩Fix(S)≠∅, the split common fixed point problem is to find a pointx*∈Rnin which
x*∈FixTsuchthatAx*∈FixS.
The key idea of this problem is to find a point in the fixed point of a nonlinear operator in a primal space, and subsequently its image under an appropriate linear transformation forms a fixed point of another nonlinear operator in another space. This situation appears, for instance, in dynamic emission tomographic image reconstruction [17] and in the intensity-modulated radiation therapy treatment planning, see [18] for more details. Of course, there are many authors that have investigated the study of iterative algorithms for split common fixed point problems and proposed their generalizations in several aspects, see, for example, [9,19,20,21,22] and references therein.
The aim of this paper is to present a nonsmooth and non-strongly convex version of the hybrid steepest descent method for minimizing the sum of two convex functions over the fixed-point constraints of the form:
minimizef(x)+h(Ax)subjecttox∈X∩FixT,Ax∈FixS,
wheref:Rn→Randh:Rm→Rare convex nonsmooth functions,A:Rn→Rmis a nonzero linear transformation,T:Rn→RnandS:Rm→Rmare certain quasi-nonexpansive operators withFix(T)≠∅, andR(A)∩Fix(S)≠∅, andX⊂Rnis a simple closed convex bounded set. We prove the convergence of function value to the minimum value where some control conditions on a stepsize sequence and a parameter are imposed.
The paper is organized as follows. After recalling and introducing some useful notions and tools in Section 2, we present our algorithm and discuss the convergence analysis in Section 3. Furthermore, in Section 4, we discuss an important implication of our problem and algorithm to the minimizing sum of convex functions over coupling constraints. In Section 5, we discuss in detail some remarkably practical applications, and Section 6 describes the results of numerical experiments on fused lasso like problem. Finally, the conclusions are given in Section 7.
2. Preliminaries We summarize some useful notations, definitions, and properties, which we will utilize later.
For further details, the reader can consult the well-known books, for instance, in [8,23,24,25].
LetRnbe an n-dimensional Euclidean space with an inner product·,·and the corresponding norm∥·∥.
LetT:Rn→Rnbe an operator. We denote the set of all fixed points of T byFixT, that is,
FixT:={x∈Rn:Tx=x}.
We say that T isρ-strongly quasi-nonexpansive (ρ-SQNE), whereρ≥0, ifFixT≠∅and
∥Tx−z∥2≤∥x−z∥2−ρ∥Tx−x∥2,
for allx∈Rnandz∈FixT. Ifρ>0, then T is called strongly quasi-nonexpansive (SQNE). Ifρ=0, then T is called quasi-nonexpansive (QNE), that is,
∥Tx−z∥≤∥x−z∥,
for allx∈Rnandz∈FixT. Clearly, if T is SQNE, then it is QNE. We say that T is cutter ifFixT≠∅and
〈x−Tx,z−Tx〉≤0,
for allx∈Rnand allz∈FixT. We say that T is firmly nonexpansive (FNE) if
Tx−Ty,x−y≥∥Tx−Ty∥2,
for allx,y∈Rn.
The following properties will be applied in the next sections.
Fact 1
(Lemma 2.1.21 [8]). IfT:Rn→RnisQNE, thenFixTis closed and convex.
Fact 2
([Theorem 2.2.5 [8]). IfT:Rn→RnisFNEwithFixT≠∅ , then T is a cutter.
Letf:Rn→Rbe a function andx∈Rn. The subdifferential of f at x is the set
∂f(x):={x*∈Rn:f(y)≥f(x)+〈x*,y−x〉forally∈Rn}.
If∂f(x)≠∅, then an elementf′(x)∈∂f(x)is called a subgradient of f at x.
Fact 3
(Corollary 16.15 [24]). Letf:Rn→Rbe a convex function. Then, the subdifferential∂f(x)≠∅for allx∈Rn.
Fact 4
(Proposition 16.17 [24]). Letf:Rn→Rbe a convex function. Then, the subdifferential∂fmaps every bounded subset ofRnto a bounded set.
As we work on the n-dimensional Euclidean space, we will use the notion of matrix instead of the notion of linear transformation throughout this work. DenoteRm×nby the set of all real-valuedm×nmatrices. LetA∈Rm×nbe given. We denote byR(A):={y∈Rm:y=Axforsomex∈Rn}its range, andA⊤its (conjugate) transpose. We denote the induced norm of A by∥A∥, which is given by∥A∥=λmax(A⊤A), whereλmax(A⊤A)is the maximum eigenvalue of the matrixA⊤A.
3. Method and its Convergence Now, we formulate the composite nonsmooth convex minimization problem over the intersections of fixed-point sets which we aim to investigate throughout this paper.
Problem 1.
LetRnandRmbe two Euclidean spaces. Assume that
(A1)
f:Rn→Randh:Rm→Rare convex functions.
(A2)
A∈Rm×nis a nonzero matrix.
(A3)
T:Rn→RnisQNE, andS:Rm→Rmis cutter withR(A)∩Fix(S)≠∅.
(A4)
X is a nonempty convex closed bounded simple subset ofRn.
Our objective is to solve
minimizef(x)+h(Ax)subjecttox∈X∩FixT,Ax∈FixS.
Throughout this work, we denote the solution set of Problem 1 byΓand assume that it is nonempty.
Problem 1 can be viewed as a bilevel problem in which data given from two sources in a system. Actually, let us consider the system of two users in different sources (they can have differently a number of factors n and m) in which they can communicate to each other via the transformation A. The first user aims to find the best solutions with respect to criterion f among many feasible points represented in the form of fixed point set of an appropriate operator T. Similarly, the second user has its own objective in the same fashion of finding the best solutions among feasible points inFixSwith priori criterion h. Now, to find the best solutions of this system, we consider the fixed-point subgradient splitting method (in short, FSSM) as follows, see Algorithm 1.
Algorithm 1: Fixed-Point Subgradient Splitting Method. |
Initialization: The positive sequence{αk}k≥1and the parameterγ∈0,+∞, and an arbitraryx1∈Rn.
Iterative Step: For givenxk∈Rn, compute
zk:=SAxk−αkγh′(SAxk),whereh′(SAxk)∈∂h(SAxk),yk:=xk+γA⊤(zk−Axk),xk+1:=PXTyk−αk f′(Tyk),wheref′(Tyk)∈∂f(Tyk). |
Remark 1.
Actually, this algorithm has simultaneously the following features; (i) splitting computation, (ii) simple scheme, and (iii) boundedness of iterates. Concerning the first feature, we present the iterative scheme by allowing the process of a subgradient of f and a use of operator T in the spaceRn, and a subgradient of h and a use of operator S in the spaceRmseparately. Regarding the simplicity of iterative scheme, we need not to compute the inverse of the matrix A; in this case, the transpose of A is enough. Finally, the third feature is typically required when performing the convergence of subgradient type method. Of course, the boundedness is often considered in image processing and machine learning in the form of a (big) box constraint or a big Euclidean ball.
To study the convergence properties of a function values of a sequence generated by Algorithm 1, we start with the following technical result.
Lemma 1.
Let{xk}k≥1be a sequence generated by Algorithm 1. Then, for everyk≥1andu∈X∩FixT∩A−1(FixS), it holds that
∥xk+1 −u∥2≤∥xk −u∥2−1−γ∥A∥2γ∥zk−Axk∥2+αk2∥f′(Tyk)∥2+γ2+2γ+2γαk2 ∥h′(SAxk)∥2+2αk(f(u)+h(Au))−(f(Tyk)+h(SAxk)).
Proof.
Letk≥1be arbitrary. By the definition of{yk}k≥1, we have
∥yk −u∥2=∥xk −u∥2+γ2 ∥A⊤(zk−Axk)∥2+2γxk−u,A⊤(zk−Axk)≤∥xk −u∥2+γ2 ∥A∥2 ∥zk−Axk∥2+2γAxk−Au,zk−Axk.
Now, by using the definition of{zk}k≥1and the cutter property of S, we derive
Axk−Au,zk−Axk=Axk−zk,zk−Axk+zk−Au,zk−Axk=−∥zk−Axk ∥2+SAxk−Au,zk−Axk−αkγh′(SAxk),zk−Axk=−∥zk−Axk ∥2+SAxk−Au,SAxk−Axk−αkγSAxk−Au,h′(SAxk)+αk2 γ2∥h′(SAxk)∥2−αkγh′(SAxk),SAxk−Axk≤−∥zk−Axk ∥2+αk2 γ2∥h′(SAxk)∥2−αkγh′(SAxk),SAxk−Au−αkγh′(SAxk),SAxk−Axk,
which in turn implies that (6) becomes
∥yk −u∥2≤∥xk −u∥2+γ∥A∥2−2γ∥zk−Axk ∥2+2αk2γ∥h′(SAxk)∥2−2αkh′(SAxk),SAxk−Au−2αkh′(SAxk),SAxk−Axk.
We now focus on the last two terms of the right-hand side of (7).
Observe that
0≤12γγ(SAxk−Axk)+2αk h′(SAxk)2=γ2∥SAxk−Axk ∥2+2αk2 ∥h′(SAxk)∥2+2αkSAxk−Axk,h′(SAxk),
thus, by the definition of{zk}k≥1, we obtain
−2αkh′(SAxk),SAxk−Axk≤γ2∥SAxk−Axk ∥2+2αk2 ∥h′(SAxk)∥2≤γ∥zk−Axk ∥2+(2+γ)αk2 ∥h′(SAxk)∥2.
Now, inequalities (6)–(8) together give
∥yk −u∥2≤∥xk −u∥2−1−γ∥A∥2γ∥zk−Axk∥2+2γ+2+γαk2 ∥h′(SAxk)∥2−2αkh′(SAxk),SAxk−Au.
On the other hand, using the definition of{xk}k≥1and the assumption that T is QNE, we obtain
∥xk+1 −u∥2≤∥Tyk−αk f′(Tyk)−u∥2=∥Tyk −u∥2+αk2 ∥f′(Tyk)∥2−2αkf′(Tyk),Tyk−u≤∥yk −u∥2+αk2 ∥f′(Tyk)∥2−2αkf′(Tyk),Tyk−u.
Replacing (9) in (10), we obtain
∥xk+1 −u∥2≤∥xk −u∥2−1−γ∥A∥2γ∥zk−Axk∥2+αk2∥f′(Tyk)∥2+γ2+2γ+2γαk2 ∥h′(SAxk)∥2−2αkf′(Tyk),Tyk−u−2αkh′(SAxk),SAxk−Au.
Next, the convexities of f and g give
f′(Tyk),u−Tyk≤f(u)−f(Tyk)
and
h′(SAxk),Au−SAxk≤h(Au)−h(SAxk).
By making use of these two inequalities in (11), we obtain
∥xk+1 −u∥2≤∥xk −u∥2−1−γ∥A∥2γ∥zk−Axk∥2+αk2∥f′(Tyk)∥2+γ2+2γ+2γαk2 ∥h′(SAxk)∥2+2αk(f(u)+h(Au))−(f(Tyk)+h(SAxk)),
which is the required inequality and the proof is completed. ☐
The following lemma is very useful for the convergence result.
Lemma 2.
Let{xk}k≥1be a sequence generated by Algorithm 1. Then,{xk}k≥1is bounded. Furthermore, if0<γ<1∥A∥2and{αk}k≥1is bounded, then the sequences{SAxk}k≥1,{h′(SAxk)}k≥1,{zk}k≥1,{yk}k≥1,{Tyk}k≥1, and{f′(Tyk)}k≥1are bounded.
Proof.
As X is a bounded set, it is clear that the sequence{xk}k≥1is bounded. Now, letu∈Γbe given. The linearity of A and quasi-nonexpansiveness of S yield
∥SAxk∥≤∥SAxk−Au∥+∥Au∥≤∥A∥∥xk−u∥+∥Au∥.
This implies that{SAxk}k≥1is bounded. Consequently, applying Fact 4, we obtain that{h′(SAxk)}k≥1is also bounded.
By the triangle inequality, we have
∥zk∥≤∥SAxk∥+αkγ∥h′(SAxk)∥.
Therefore, the boundedness of{αk}k≥1implies that{zk}k≥1is bounded. Consequently, the triangle inequality and the linearity ofA⊤yields the boundedness of{yk}k≥1. As T is QNE, we have{Tyk}k≥1is bounded. Thus,{f′(Tyk)}k≥1is bounded by Fact 4. ☐
For the sake of simplicity, we let
(f+h∘A)*:=infz∈X∩FixT∩A−1(FixS)(f(z)+h(Az))
and assume that(f+h∘A)*>−∞.
We consider a convergence property in objective values with diminishing stepsize as the following theorem.
Theorem 1.
Let{xk}k≥1be a sequence generated by Algorithm 1. If the following control conditions hold,
(i)
0<γ<1∥A∥2;
(ii)
∑k=1∞ αk=+∞and∑k=1∞ αk2<+∞;
then
lim infk→+∞(f(Tyk)+h(SAxk))≤(f+h∘A)*.
Proof.
Letz∈Γbe given. We note from Lemma 1 that for everyk≥1
2αk[(f(Tyk)+h(SAxk))−(f(z)+h(Az))]≤∥xk −z∥2−∥xk+1 −z∥2+αk2 ∥f′(Tyk)∥2+γ2+2γ+2γαk2 ∥h′(SAxk)∥2,
this is true because−1−γ∥A∥2γ∥zk−Axk∥2≤0 via the assumption (i). Summing up (12) for1,…,kwe obtain that
2∑i=1kαk[(f(Tyk)+h(SAxk))−(f+h∘A)*)]≤∥x1 −z∥2+M∑i=1kαk2,
where
M:=supk≥1∥f′(Tyk)∥2+γ2+2γ+2γ∥h′(SAxk)∥2.
This implies that
∑i=1∞αk[(f(Tyk)+h(SAxk))−(f+h∘A)*)]<+∞.
Next, we show thatlim infk→+∞(f(Tyk)+h(SAxk))−(f+h∘A)*)≤0. By supposing a contradiction that
lim infk→+∞(f(Tyk)+h(SAxk))−(f+h∘A)*)>0,
there existk0≥1andγ>0such that
(f(Tyk)+h(SAxk))−(f+h∘A)*)≥γ
for allk≥k0. Thus, we have
+∞=γ∑k=k0∞αk≤∑k=k0∞αk(f(Tyk)+h(SAxk))−(f+h∘A)*)<+∞,
which is a contradiction. Therefore, we can conclude that
lim infk→+∞(f(Tyk)+h(SAxk))≤(f+h∘A)*.
☐
Remark 2.
The convergence results obtained in Theorem 1 are slightly different from the convergence results obtained by the classical gradient method or even projected gradient method, namely,lim infk→+∞(f(Tyk)+h(SAxk))=(f+h∘A)*. This is because, in each iteration, we can not ensure whether the estimateTxkis belonging to the constrained setFix(T)or not, this means that the propertyf(Tyk)≥f*may not be true in general. Similarly, we can not ensure thath(SAxk)≥(h∘A)*.
Remark 3.
A step size sequence satisfies the assumption that∑k=1∞ αk=+∞and∑k=1∞ αk2<+∞is, for instance,akk≥1wherea>0.
4. Convex Minimization Involving Sum of Composite Functions The aim of this section is to show that Algorithm 1 and their convergence properties can be employed when solving a convex minimization involving sum of a finite number of composite functions.
Let us take a look the composite convex minimization problem:
minimizef(x)+∑i=1lhi(Aix)subjecttox∈X∩FixT,Aix∈FixSi,i=1,…,l.
where, we assume further that, for alli=1,…,l, there hold
(I)
]Ai∈Rmi×nare nonzero matrices,
(II)
Si:Rmi →Rmi are cutter operators withR(Ai)∩FixSi≠∅, and
(III)
hi:Rmi →Ris a convex function.
In this section, we assume that the solution set of (13) is denoted byΩand asuume that it is nonempty.
Denote the product of spaces
Rm:=Rm1 ×Rm2 ×⋯×Rml
equipped with the addition
x+y:=(x1+y1,x2+y2,…,xl+yl),
the scalar multiplication
αx:=(αx1,αx2,…,αxl)
with the inner product defined by
〈〈x,y〉〉Rm :=∑i=1l〈xi,yi〉Rmi ,
and the norm by
∥x∥Rm =〈〈x,x〉〉Rm ,
for allx=(x1,x2,…,xl),y=(y1,y2,…,yl)∈Rm , is again a Euclidean space (see [24], Example 2.1). Define a matrixA∈Rn×mby
A:=[A1⊤|A2⊤|…|Am⊤ ]⊤,
and an operatorS:Rm→Rmby
S(y):=(S1 y1,S2 y2,…,Sl yl),
for ally=(y1,y2,…,yl)∈Rm. Note that the operatorSis cutter with
Fix(S)=Fix(S1)×⋯×Fix(Sl).
Furthermore, defining a functionh:Rm→Rby
h(x):=∑i=1lhi(xi),
for allx=(x1,x2,…,xl)∈Rm, we also have that the functionh is a convex function (see [24], Proposition 8.25). By hte above setting, we can rewrite the problem (13) as
minimizef(x)+h(Ax)subjecttox∈X∩FixT,Ax∈FixS,
which is nothing else than Problem 1.
Here, to investigate the solving of the problem (13), we state the Algorithm 2 as follow.
Algorithm 2: Distributed Fixed-Point Subgradient Splitting Method. |
Initialization: The positive sequence{αk}k≥1and the parameterγ∈0,+∞, and an arbitraryx1∈Rn.
Iterative Step: For givenxk∈Rn, compute
zk,i:=Si Ai xk−αkγdk,i,dk,i∈∂hi(Si Ai xk),foralli=1,…,l,yk:=xk+γ∑i=1lAi⊤(zk,i−Ai xk),xk+1:=PXTyk−αk f′(Tyk),f′(Tyk)∈∂f(Tyk). |
As an above consequence, we note that
A⊤x=∑i=1lAi⊤ xi,
for allx=(x1,x2,…,xl)∈Rm. Furthermore, we know that
∂h(SAxk)=∂h1(S1 A1 xk)×⋯×∂hl(Sl Al xk),
see ([25], Corollary 2.4.5). Puttingdk:=(dk,1,…,dk,l)wheredk,i∈∂hi(Si Ai xk),i=1,…,l, for allk≥1, we obtain that
zk:=(zk,1,…,zk,m)=SAxk−αk dk
for allk≥1. Notice that
A⊤(zk−Axk)=∑i=1lAi⊤zk,i−Ai xk
for allk≥1.Thus, Algorithm 2 can be rewrite as
zk=SAxk−αkγdk,dk∈∂h(SAxk)yk=xk+γA*(zk−Axk),xk+1=PXTyk−αk f′(Tyk),f′(Tyk)∈∂f(Tyk).
for allk≥1. Since∥A∥2≤∑i=1l ∥Ai∥2, the convergence result therefore follows from Theorem 1 and can be stated as the following corollary.
Corollary 1.
Let{xk}k≥1be a sequence generated by Algorithm 2. If the following control conditions hold:
(i)
0<γ<1∑i=1l ∥Ai∥2;
(ii)
∑k=1∞ αk=+∞and∑k=1∞ αk2<+∞;
then
lim infk→+∞f(Tyk)+∑i=1lhi(Si Ai xk)≤f+∑i=1lhi∘Ai*.
5. Related Problems
The aim of this section is to show that Algorithm 1 and their convergence results can be employed when solving some well-known problems. Furthermore, we also present some noticeable applications relate to Algorithm 1 and its convergence results. For simplicity, we assume here thatS=I.
5.1. Convex Minimization with Least Square Constraint
Let us discuss the composite minimization problem over the set of all minimizers of the proximity function of a system of linear equations:
minimizef(x)+h(Ax)subjecttox∈X∩argming:=12∥B·−b∥2,
where B is a nonzerop×nmatrix and b is anp×1vector. Now, we define the Landweber operator by
L(x):=x−1∥B∥2B⊤Bx−b.
Suppose that B is nonzero, we have∥B∥2=λmax(B⊤B)≠0, which yields that the Landweber operator L is well defined. Furthermore, ifargming≠∅, then it holds that L is FNE withFixL=argming [8] (Lemma 4.6.2, Theorem 4.6.3). In view ofT:=L , the problem (14) is a special case of Problem 1 so that the problem (14) can be solved by Algorithm 1. Furthermore, if the set{u∈Rn:Bu=b}≠∅ , the problem (14) is equivalent to
minimizef(x)+h(Ax)subjecttoBx=bx∈X.
5.2. Convex Minimization with Nonsmooth Functional Constraints
Let us discuss a convex minimization with nonsmooth functional constraints
minimizef(x)+h(Ax)subjecttogi(x)≤0,i=1,…,l,x∈X,
wheregi:Rn→Rare convex function,i=1,…,l. Denote
Ξ(gi,0):={x∈Rn:gi(x)≤0}
and assume that
⋂i=1lΞ(gi,0)≠∅.
Furthermore, define the subgradient projection related togiby
Pix:=x−gi(x)∥gi′ (x)∥2gi′(x),ifgi(x)>0x,otherwise,
wheregi′(x)is a (fixed) subgradient ofgiatx∈Rn. It is well known thatPiis a cutter (a 1-SQNE operator) withFixPi=Ξ(gi,0). Note that, if⋂i=1lΞ(gi,0)≠∅, the cyclic subgradient projection operator
P:=Pm Pm−1…P1,
which is a composition of SQNE operatorsPi, is SQNE with
FixP=⋂i=1lFixPi=⋂i=1lΞ(gi,0).
(Theorem 2.1.50, [8]). In 2018, Cegielski and Nimana [26] proposed the following extrapolated operator,
Pλ,σ(x)=x+λσ(x)(Px−x),
whereλ∈(0,2)andσ:Rn→Ris a step size function defined by
σ(x)=σmax(x):=∑i=1p〈Px−Ui−1x,Uix−Ui−1x〉∥Px−x∥2,forx∉C,1,otherwise,
whereUi:=Pi Pi−1…P1fori=1,2,…,mandU0:=I. Note that the operatorPλ,σis SQNE withFixPλ,σ=FixP≠∅ ([26], Theorem 3.2).
By means ofT:=Pλ,σorT:=P , the problem (15) is nothing else than Problem 1 and Algorithm 1 is applicable for the problem (15).
5.3. Convex Minimization with Complex Constraints
LetB:Rn⇉Rnbe a set-valued operator. We denote by
Gr(B):={(x,u)∈Rn×Rn:u∈Bx}
its graph, and
zer(B):={z∈Rn:0∈B(z)}
the set of all zeros of the operator B. The set-valued operator B is said to be monotone if
〈x−y,u−v〉≥0,
for all(x,u),(y,v)∈Gr(B), and it is called maximally monotone if its graph is not properly contained in the graph of any other monotone operators. For a set-valued operatorB:Rn⇉Rn, we define the resolvent of B,JB:Rn⇉Rn, by
JB:=(Id+B)−1.
It is well known that if B is maximally monotone andr>0, then the resolvent ofrBis (single-valued) FNE with
zer(B)=FixJrB,
see ([24], Corollary 23.31, Proposition 23.38). Now, let us consider the minimal norm-like solution of the classical monotone inclusion problem:
minimizef(x)+h(Ax)subjecttox∈X∩zer(B),
whereB:Rn⇉Rnis a maximally monotone operator such thatzer(B)≠∅. For givenr>0, and puttingT:=JrB , the problem (15) is nothing else than Problem 1 and Algorithm 1 is also applicable for the problem (16).
In particular, let us consider the following minimal norm-like solution of minimization problem,
minimizef(x)+h(Ax)subjecttox∈X∩argminφ.
The problem (17) has been considered by many authors, for instance [2,27,28,29,30] and references therein. Recall that for givenr>0and a proper convex lower semicontinuous functionφ:Rn→(−∞,+∞], we denote byproxrφ(x)the proximal point of parameter r ofφat x, which is the unique optimal solution of the optimization problem
minφ(u)+12r∥u−x∥2:u∈Rn.
Note thatproxrφ=Jr∂φ. Therefore, puttingT:=proxrφ , Algorithm 1 is also applicable for the problem (17).
6. Numerical Experiments In this section, to demonstrate the effectiveness of the fixed-point subgradient splitting method (Algorithm 1), we apply the proposed method to solve the fused lasso like problem. All the experiments were performed under MATLAB 9.6 (R2019a) running on a MacBook Pro 13-inch, 2019 with a 2.4 GHz Intel Core i5 processor and 8 GB 2133 MHz LPDDR3 memory.
For a given design matrixA:=[a1|⋯|ar ]⊤inRr×swhereai=(a1i,…,asi)∈Rsand a response vectorb=(b1,…,br)∈Rr. We consider the fused lasso like problem of the form
minimize∥x∥1+∥Dx∥1subjecttox∈[−1,1]s∩argmin12∥A·−b∥2,
where
D:=−110⋯000−11⋯00⋯⋯⋯⋯⋯⋯000⋯−11∈R(s−1)×s
Observe that by setting the functionsf(x)=∥x∥1,h(x):=∥x∥1,A:=D, the Landweber operator
T(x):=x−1∥A∥2A⊤Ax−b,
S:=Id; the identity matrix; and the constrained setX:=[−1,1]s , we obtain that the problem (18) is a special case of Problem 1 so that the problem (14) can be solved by Algorithm 1 (See Section 5.1) for more details.
We generate the matrixAwith normally distributed random chosen in(−10,10)with a given percentagepAof nonzero elements. We generate vectorsb=(b1,…,br)∈Rrcorresponding toAby the linear modelb=Ax0+ϵ, whereϵ∼N(0,∥Ax0 ∥2)and the vectorx0has10%of nonzero components with normally distributed random generating. The initial point is a vector whose coordinates are chosen randomly in(−1,1). In the numerical experiment, denoting the estimate
Fk:=f(Tyk)+h(SAxk)
for allk≥1, we consider the behavior of the average of the relative changes
1k∑j=1k|Fj+1−Fj||Fj|+1
with the optimality tolerance10−3. We performed 10 independent tests for any collection of dimensions(r,s)and the percentages of nonzero elements ofAfor various step size parametersαk . The results are showed in Table 1, where the average number of iterations (#Iters) and average CPU time (Time) to reach the optimality tolerance for any collection of parameters are presented.
We see that the method performed with parameterαk=0.1/kbehaves significantly better than other in the sense of the average number of iterations as well as the averaged CPU time for all dimensions and percentages of nonzero elements ofA. Moreover, in the casepA=10%, we observed much bigger number of the averaged iterations and CPU time for the choice of the combinations of parametersαk=0.3/k,0.5/k,0.7/kand0.9/kwith all the problem sizes.
7. Conclusions In this paper, we introduced a simple fixed-point subgradient splitting method, whose main feature is the combination of the subgradient method with the hybrid steepest descent method relating to a nonlinear operator. We performed the convergence analysis of the method by proving the convergence of the function values to the minimum value. The result is obtained by adopting some specific suitable assumptions on the step sizes. We discussed in detail the applications of the proposed scheme to some remarkable problems in the literature. Numerical experiments on fused lasso like problem show evidence of the performance of our work.
αk→ | pA | 0.1/k | 0.3/k | 0.5/k | 0.7/k | 0.9/k | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
(r,s)↓ | #Iters | Time | #Iters | Time | #Iters | Time | #Iters | Time | #Iters | Time | |
(20,50) | 10% | 54,253 | 6.33 | 87,529 | 13.56 | 87,416 | 13.30 | 87,347 | 13.35 | 87,455 | 13.45 |
20% | 58,085 | 6.99 | 71,969 | 9.85 | 71,813 | 9.72 | 71,831 | 9.77 | 71,928 | 9.85 | |
50% | 55,414 | 6.38 | 63,113 | 8.13 | 62,962 | 8.00 | 62,914 | 8.02 | 62,969 | 8.10 | |
80% | 56,190 | 6.46 | 57,780 | 6.97 | 57,459 | 6.82 | 57,170 | 6.80 | 56,949 | 6.81 | |
100% | 56,960 | 6.60 | 54,084 | 6.32 | 53,800 | 6.19 | 53,630 | 6.18 | 53,538 | 6.22 | |
(50,100) | 10% | 111,358 | 25.63 | 137,970 | 42.04 | 137,347 | 42.44 | 137,099 | 42.86 | 137,197 | 42.49 |
20% | 108,019 | 25.21 | 128,821 | 34.51 | 128,621 | 34.85 | 128,536 | 35.51 | 128,496 | 35.50 | |
50% | 108,760 | 25.54 | 117,274 | 28.93 | 116,965 | 28.77 | 116,818 | 29.01 | 116,751 | 28.80 | |
80% | 108,409 | 25.30 | 105,886 | 24.94 | 105,621 | 24.81 | 105,529 | 25.00 | 105,726 | 24.86 | |
100% | 107,967 | 25.42 | 105,388 | 24.81 | 104,766 | 24.55 | 104,582 | 24.70 | 104,519 | 24.49 | |
(100,200) | 10% | 209,683 | 117.26 | 365,219 | 347.96 | 364,925 | 353.51 | 364,907 | 354.07 | 364,951 | 354.03 |
20% | 210,069 | 116.14 | 285,174 | 193.44 | 285,100 | 198.02 | 285,048 | 197.57 | 285,049 | 197.01 | |
50% | 208,808 | 116.02 | 281,421 | 185.71 | 281,167 | 188.84 | 280,913 | 190.41 | 280,756 | 189.61 | |
80% | 213,331 | 118.26 | 212,045 | 119.80 | 211,676 | 122.81 | 211,647 | 122.87 | 211,749 | 127.19 | |
100% | 213,904 | 114.29 | 197,259 | 104.91 | 197,104 | 107.81 | 197,023 | 108.38 | 196,954 | 108.52 |
Funding
This work was financial supported by the young researcher development project of Khon Kaen University.
Acknowledgments
The author is thankful to two anonymous reviewers and the Assistant Editor for comments and remarks which improved the quality of the paper. The author also thanks Satit Saejung for his suggestions under the young researcher development project of Khon Kaen University.
Conflicts of Interest
The author declares no conflicts of interest.
Table
Table 1.Influences of step sizes{αk}k≥1for different sizes and different percentages of nonzero elements of a random matrixA.
Table 1.Influences of step sizes{αk}k≥1for different sizes and different percentages of nonzero elements of a random matrixA.
αk→ | pA | 0.1/k | 0.3/k | 0.5/k | 0.7/k | 0.9/k | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
(r,s)↓ | #Iters | Time | #Iters | Time | #Iters | Time | #Iters | Time | #Iters | Time | |
(20,50) | 10% | 54,253 | 6.33 | 87,529 | 13.56 | 87,416 | 13.30 | 87,347 | 13.35 | 87,455 | 13.45 |
20% | 58,085 | 6.99 | 71,969 | 9.85 | 71,813 | 9.72 | 71,831 | 9.77 | 71,928 | 9.85 | |
50% | 55,414 | 6.38 | 63,113 | 8.13 | 62,962 | 8.00 | 62,914 | 8.02 | 62,969 | 8.10 | |
80% | 56,190 | 6.46 | 57,780 | 6.97 | 57,459 | 6.82 | 57,170 | 6.80 | 56,949 | 6.81 | |
100% | 56,960 | 6.60 | 54,084 | 6.32 | 53,800 | 6.19 | 53,630 | 6.18 | 53,538 | 6.22 | |
(50,100) | 10% | 111,358 | 25.63 | 137,970 | 42.04 | 137,347 | 42.44 | 137,099 | 42.86 | 137,197 | 42.49 |
20% | 108,019 | 25.21 | 128,821 | 34.51 | 128,621 | 34.85 | 128,536 | 35.51 | 128,496 | 35.50 | |
50% | 108,760 | 25.54 | 117,274 | 28.93 | 116,965 | 28.77 | 116,818 | 29.01 | 116,751 | 28.80 | |
80% | 108,409 | 25.30 | 105,886 | 24.94 | 105,621 | 24.81 | 105,529 | 25.00 | 105,726 | 24.86 | |
100% | 107,967 | 25.42 | 105,388 | 24.81 | 104,766 | 24.55 | 104,582 | 24.70 | 104,519 | 24.49 | |
(100,200) | 10% | 209,683 | 117.26 | 365,219 | 347.96 | 364,925 | 353.51 | 364,907 | 354.07 | 364,951 | 354.03 |
20% | 210,069 | 116.14 | 285,174 | 193.44 | 285,100 | 198.02 | 285,048 | 197.57 | 285,049 | 197.01 | |
50% | 208,808 | 116.02 | 281,421 | 185.71 | 281,167 | 188.84 | 280,913 | 190.41 | 280,756 | 189.61 | |
80% | 213,331 | 118.26 | 212,045 | 119.80 | 211,676 | 122.81 | 211,647 | 122.87 | 211,749 | 127.19 | |
100% | 213,904 | 114.29 | 197,259 | 104.91 | 197,104 | 107.81 | 197,023 | 108.38 | 196,954 | 108.52 |
1. Tikhonov, A.N.; Arsenin, V.Y. Solution of Ill-Posed Problems; V.H. Winston: Washington, DC, USA, 1977.
2. Beck, A.; Sabach, S. A first order method for finding minimal norm-like solutions of convex optimization problems. Math. Program. 2014, 147, 25-46.
3. Ono, S.; Yamada, I. Hierarchical convex optimization with primal-dual splitting. IEEE Trans. Signal Proc. 2015, 63, 373-388.
4. Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 2015, 67, 91-108.
5. Condat, L. A direct algorithm for 1d total variation denoising. IEEE Signal Process. Lett. 2013, 20, 1054-1057.
6. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301-320.
7. Starck, J.L.; Murtagh, F.; Fadili, J.M. Sparse Image and Signal Processing, Wavelets, Curvelets. Morphological Diversity; Cambridge University Press: Cambridge, MA, USA, 2010.
8. Cegielski, A. Iterative Methods for Fixed Point Problems in Hilbert Spaces; Lecture Notes in Mathematics; Springer: Heidelberg, Germany, 2012; Volume 2057.
9. Cegielski, A. General method for solving the split common fixed point problem. J. Optim. Theory Appl. 2015, 165, 385-404.
10. Cegielski, A. Application of quasi-nonexpansive operators to an iterative method for variational inequality. SIAM J. Optim. 2015, 25, 2165-2181.
11. Yamada, I.; Ogura, N. Hybrid steepest descent method for variational inequality problem over the fixed point set of certain quasi-nonexpansive mappings. Numer. Funct. Anal. Optimiz. 2004, 25, 619-655.
12. Yamada, I. The hybrid steepest descent method for the variational inequality problem over the intersection of fixed point sets of nonexpansive mappings. In Inherently Parallel Algorithm for Feasibility and Optimization; Butnariu, D., Censor, Y., Reich, S., Eds.; Elsevier: New York, NY, USA, 2001; pp. 473-504.
13. Iiduka, H. Convergence analysis of iterative methods for nonsmooth convex optimization over fixed point sets of quasi-nonexpansive mappings. Math. Program. 2016, 159, 509-538.
14. Iiduka, H. Proximal point algorithms for nonsmooth convex optimization with fixed point constraints. Euro. J. Oper. Res. 2016, 25, 503-513.
15. Censor, Y.; Elfving, T. A multiprojection algorithm using Bregman projections in product space. Numer. Algor. 1994, 8, 221-239.
16. Censor, Y.; Segal, A. The split common fixed-point problem for directed operators. J. Convex Anal. 2009, 16, 587-600.
17. Byrne, C. Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 2002, 18, 441-453.
18. Censor, Y.; Bortfeld, T.; Martin, B.; Trofimov, A. A unified approach for inversion problems in intensity modulated radiation therapy. Phys. Med. Biol. 2006, 51, 2353-2365.
19. Ansari, Q.H.; Nimana, N.; Petrot, N. Split hierarchical variational inequality problems and related problems. Fixed Point Theory Appl. 2014, 2014, 208.
20. Nimana, N.; Petrot, N. Subgradient algorithm for split hierarchical optimization problem. Carpathian J. Math. 2018, 34, 391-399.
21. Kraikaew, R.; Saejung, S. On split common fixed point problems. J. Math. Anal. Appl. 2014, 415, 513-524.
22. Moudafi, A. The split common fixed-point problem for demicontractive mappings. Inverse Probl. 2010, 26, 055007.
23. Beck, A. First-Order Methods in Optimization; MOS-SIAM Series on Optimization; SIAM: Philadelphia, PA, USA, 2017.
24. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2012.
25. Zălinescu, C. Convex Analysis in General Vector Spaces; World Scientific: Singapore, 2002.
26. Cegielski, A.; Nimana, N. Extrapolated cyclic subgradient projection methods for the convex feasibility problems and their numerical behaviour. Optimization 2019, 68, 145-161.
27. Attouch, H.; Czarnecki, M.-O.; Peypouquet, J. Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J. Optim. 2011, 21, 1251-1274.
28. Boţ, R.I.; Csetnek, E.R.; Nimana, N. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J. Math. 2018, 46, 53-71.
29. Noun, N.; Peypouquet, J. Forward-backward penalty scheme for constrained convex minimization without inf-compactness. J. Optim. Theory Appl. 2013, 158, 787-795.
30. Peypouquet, J. Coupling the gradient method with a general exterior penalization scheme for convex minimization. J. Optim. Theory Appl. 2012, 153, 123-138.
Nimit Nimana
Department of Mathematics, Faculty of Science, Khon Kaen University, Khon Kaen 40002, Thailand
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In this work, we consider a bilevel optimization problem consisting of the minimizing sum of two convex functions in which one of them is a composition of a convex function and a nonzero linear transformation subject to the set of all feasible points represented in the form of common fixed-point sets of nonlinear operators. To find an optimal solution to the problem, we present a fixed-point subgradient splitting method and analyze convergence properties of the proposed method provided that some additional assumptions are imposed. We investigate the solving of some well known problems by using the proposed method. Finally, we present some numerical experiments for showing the effectiveness of the obtained theoretical result.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer