A Novel Value for the Parameter in the

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction and Background Results

The topic of our research is solving the unconstrained nonlinear optimization problem $\begin{matrix} (1) & \min f x, x \in ℝ^{n}, \end{matrix}$ where the function $f : ℝ^{n} ⟶ ℝ$ is continuously differentiable and bounded below. Following the standard notation, $g_{k} = \nabla f x_{k}$ denotes the gradient, $s_{k - 1} = x_{k} - x_{k - 1}$ and $y_{k - 1} = g_{k} - g_{k - 1}$ . Using an extended conjugacy condition $\begin{matrix} (2) & d_{k}^{T} y_{k - 1} = - t g_{k}^{T} s_{k - 1}, t > 0, \end{matrix}$ Dai and Liao in [1] proposed the conjugate gradient (CG) method $\begin{matrix} (3) & x_{k + 1} = x_{k} + α_{k} d_{k}, \end{matrix}$ where the step size $α_{k}$ is a positive parameter, $x_{k}$ is an already generated point, $x_{k + 1}$ is a new iterative point, and $d_{k}$ is a suitable search direction. The search directions $d_{k}$ are generated by the conceptual formula $\begin{matrix} (4) & d_{k} = \begin{cases} - g_{0}, & k = 0, \\ - g_{k} + β_{k}^{DL} d_{k - 1}, & k \geq 1, \end{cases} \end{matrix}$ where the conjugate gradient coefficient $β_{k}^{DL}$ is defined by $\begin{matrix} (5) & β_{k}^{DL} = Y t ≔ \frac{g_{k}^{T} y_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} - t \frac{g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}}, t > 0, \end{matrix}$ wherein $t > 0$ is a scalar.

Some well-known formulas for defining $β_{k}$ have been created by modifying the conjugate gradient parameter $β_{k}^{DL}$ [2–9]. One of them is denoted as $β_{k}^{MHSDL}$ and defined in [7] by $\begin{matrix} (6) & β_{k}^{MHSDL} = Y_{1} t ≔ \frac{g_{k}^{T} \hat{y_{k - 1}}}{d_{k - 1}^{T} y_{k - 1}} - t \frac{g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}}, \end{matrix}$ where $t > 0$ is a scalar as in (5) and $\hat{y_{k - 1}} = g_{k} - ∥ g_{k} ∥ / ∥ g_{k - 1} ∥ g_{k - 1}$ .

The family of CG methods for nonlinear optimization has reached great popularity lately, thanks to the various benefits and advantages it possesses. The most important property is based on computationally efficient iterations arising from a simple CG rule. This property initiates the high efficiency of CG methods with respect to analogous methods for nonlinear optimization. Moreover, global convergence is ensured under suitable conditions. Finally, the application of various CG methods in solving image restoration problems has become an important research topic [10, 11].

Since the parameter $t$ is important for the numerical behavior of Dai-Liao (DL) CG methods [12], one of the most important problems in the implementation of the DL class CG method is to determine a proper value $t > 0$ which will give desirable results. Many scientists have invested a lot of time and effort in the previous period to determine the best definition of the nonnegative parameter $t$ in the DL class CG methods. So far, the research in finding the appropriate value of $t$ has evolved in two directions. One group of methods is aimed at finding an appropriate fixed value for $t$ [1, 2, 6–8], while methods from another group promote appropriate rules for computing values of $t$ in each iteration, which ensure a satisfactory decrease of the objective. In our research, we will pay attention to the second research stream: find the parameter $t$ whose values change through iterations so that the faster convergence is achieved. The value of the parameter $t$ defined in the $k$ th iteration will be denoted by $t k ≔ t_{k}$ .

In order to complete the presentation, we will restate the main principles proposed so far for computing $t_{k}$ . Hager and Zhang in [13, 14] proposed the DL CG method (5), known as CG-DESCENT, where $t k \equiv t_{k 1}$ is defined by $\begin{matrix} (7) & t k \equiv t_{k 1} ≔ 2 \frac{∥ y_{k - 1} ∥^{2}}{y_{k - 1}^{T} s_{k - 1}} . \end{matrix}$

Dai and Kou [15] suggested the conjugate gradient coefficient $β_{k}^{DK}$ of the form $\begin{matrix} (8) & β_{k}^{DK} = Y τ_{k} + \frac{{y_{k - 1}}^{2}}{y_{k - 1}^{T} s_{k - 1}} - \frac{y_{k - 1}^{T} s_{k - 1}}{{s_{k - 1}}^{2}} = \frac{g_{k}^{T} y_{k - 1}}{y_{k - 1}^{T} d_{k - 1}} - τ_{k} + \frac{∥ y_{k - 1} ∥^{2}}{y_{k - 1}^{T} s_{k - 1}} - \frac{y_{k - 1}^{T} s_{k - 1}}{∥ s_{k - 1} ∥^{2}} \frac{g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}}, \end{matrix}$ where $τ_{k}$ is the scaling parameter arising from the self-scaling memoryless BFGS method. Clearly, the Dai and Kou (DK) method is a member of the DL class CG methods, which is determined by $\begin{matrix} (9) & t k \equiv t_{k 2} ≔ τ_{k} + \frac{∥ y_{k - 1} ∥^{2}}{y_{k - 1}^{T} s_{k - 1}} - \frac{y_{k - 1}^{T} s_{k - 1}}{∥ s_{k - 1} ∥^{2}} . \end{matrix}$

The results given in [15] confirm that the DK iterations outperform many existing CG methods. Following the development of DL methods, Babaie-Kafaki and Ghanbari [16] defined two new ways to calculate the value of the parameter $t$ in (5), as in the following two formulas: $\begin{matrix} (10) & t k \equiv t_{k 3} ≔ \frac{s_{k - 1}^{T} y_{k - 1}}{{s_{k - 1}}^{2}} + \frac{y_{k - 1}}{s_{k - 1}}, \\ t k \equiv t_{k 4} ≔ \frac{y_{k - 1}}{s_{k - 1}} . \end{matrix}$

Andrei in [17] proposed the new rule for calculating $t$ in order to define $Y t$ in (5) and defined a new variant of the DL class CG methods, denoted by DLE, with $\begin{matrix} (11) & t k \equiv t_{k 5} ≔ \frac{s_{k - 1}^{T} y_{k - 1}}{{s_{k - 1}}^{2}} . \end{matrix}$

Lotfi and Hosseini in [18] proposed the following rule for determining the parameter $t k$ , using the expression $\begin{matrix} (12) & t k \equiv t_{k 6} ≔ \max t_{k 6}^{*}, υ \frac{{y_{k - 1}}^{2}}{s_{k - 1}^{T} y_{k - 1}}, \end{matrix}$ where $\begin{matrix} (13) & t_{k 6}^{*} ≔ \frac{1 - h_{k} g_{k - 1^{r}} s_{k - 1}^{T} g_{k} + g_{k}^{T} y_{k - 1} / y_{k - 1}^{T} s_{k - 1} h_{k} {g_{k - 1}}^{r} {s_{k - 1}}^{2}}{g_{k}^{T} s_{k - 1} + g_{k}^{T} s_{k - 1} / s_{k - 1}^{T} y_{k - 1} h_{k} {g_{k - 1}}^{r} {s_{k - 1}}^{2}}, \\ h_{k} = C + \max - \frac{s_{k - 1}^{T} y_{k - 1}}{{s_{k - 1}}^{2}}, 0 {g_{k - 1}}^{- r}, \end{matrix}$ and $υ > 1 / 4$ , $C$ , and $r$ are three positive constants.

On the basis of the above overview of the main CG methods and motivated by the strong theoretical properties and computational efficiency of modified Dai-Liao CG methods proposed by many researchers, we suggest a new way of calculating the value of the parameter $t k$ . As a consequence, the corresponding CG method of DL type, termed as the Effective Dai-Liao (EDL) method, is proposed and its convergence is proven. Numerical testing and comparison with other known DL variants are presented in order to show the effectiveness of the introduced method. Analysis of generated numerical results exhibits that the proposed EDL method is efficient compared with other DL-type methods.

The global organization of sections is described as follows. Introduction, motivation, and a brief overview of the preliminary results are given in Section 1. A new rule for calculating the variable parameter $t k$ is proposed in Section 2. An effective algorithm and global convergence of the EDL method initiated by $t k$ are given in the same section. The new EDL method is tested in Section 3 on some unlimited optimization test problems and compared against some known variants of the DL class methods. Finally, concluding remarks are presented in the last concluding section.

2. A Modified Dai-Liao Method and Its Convergence

Popularity in defining new rules for calculating $t k$ is a guarantee that such an approach is effective and still insufficiently explored. The idea for defining a new parameter $t_{k}^{*}$ comes from previously described rules for computing $t k$ , particularly from the paper Li and Ruan [19] and from the idea which can be found in the paper Yuan et al. [11]. Further, analyzing the results from [1, 2, 6–8], we conclude that the scalar $t$ was defined by a fixed value of $0.1$ in related numerical experiments. Also, numerical experience related to the fixed valued $t = 1$ was reported in [1]. According to this experience, our intention is to define variable values $t k$ inside the interval $0, 1$ .

To successfully define $t k$ with values belonging to the interval $0, 1$ , let us start from the definition of the quantity $L_{k}$ which was used in defining the direction $d_{k}$ in [19]. The parameter $L_{k}$ was defined by $L_{k} = s_{k - 1}^{T} s_{k - 1} / s_{k - 1}^{T} y_{k - 1}^{*} \in 0, 1, k \geq 0$ , where $\begin{matrix} (14) & y_{k - 1}^{*} = y_{k - 1} + \max 0, - \frac{s_{k - 1}^{T} y_{k - 1}}{∥ s_{k - 1} ∥^{2}} + 1 s_{k - 1} . \end{matrix}$

By putting $y_{k - 1}^{*}$ into $L_{k}$ , the following can be obtained: $\begin{matrix} (15) & L_{k} = \frac{s_{k - 1}^{T} s_{k - 1}}{s_{k - 1}^{T} y_{k - 1} + \max 0, - s_{k - 1}^{T} y_{k - 1} / {s_{k - 1}}^{2} + 1 s_{k - 1}} = \frac{{s_{k - 1}}^{2}}{s_{k - 1}^{T} y_{k - 1} + \max 0, - s_{k - 1}^{T} y_{k - 1} / ∥ s_{k - 1} ∥^{2} + 1 {s_{k - 1}}^{2}} . \end{matrix}$

Further, with certain modifications and substitutions in the equation defining $L_{k}$ , as well as using the function $\max$ , which chooses the maximum between the value of the expression $d_{k - 1}^{T} g_{k}$ and $1$ , we come to a new definition of the parameter $t k$ . As described in advance imposed desired restrictions, the novel parameter $t_{k}^{*}$ is defined by $\begin{matrix} (16) & t_{k}^{*} = \frac{{g_{k}}^{2}}{\max 1, d_{k - 1}^{T} g_{k} + \max 0, d_{k - 1}^{T} g_{k} / ∥ g_{k} ∥^{2} + 1 {g_{k}}^{2}} . \end{matrix}$

It is easy to verify that $t_{k}^{*}$ defined by (16) satisfies $\begin{matrix} (17) & 0 < t_{k}^{*} \leq \frac{{g_{k}}^{2}}{1 + 0 + 1 {g_{k}}^{2}} = \frac{{g_{k}}^{2}}{1 + {g_{k}}^{2}} < 1 . \end{matrix}$

Accordingly, $t_{k}^{*} \in 0, 1$ , which was our initial intention. Clearly, greater values of $∥ g_{k} ∥$ lead to values $t_{k}^{*} ↗ 1$ . Further, since the trend $∥ g_{k} ∥ ⟶ 0$ is expectable, we can expect smaller values $t_{k}^{*} ↘ 0$ in late iterations. Therefore, $t_{k}^{*}$ is suitable for defining corresponding conjugate gradient coefficient $Y t$ or $Y_{1} t$ and further DL CG iterations (4).

Considering $t = t_{k}^{*}$ in (6), it is reasonable to propose a novel variant of the Dai-Liao CG parameter $β_{k}^{EDL}$ which is subject to the following rule during the iterative process: $\begin{matrix} (18) & β_{k}^{EDL} = Y_{1} t_{k}^{*} ≔ \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} - t_{k}^{*} \frac{g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} . \end{matrix}$

Before the main algorithm, it is necessary to define the backtracking line search as one of the most popular and practical methods for computing the step length $α_{k}$ in (3). The procedure for the backtracking line search proposed in [20] starts from the initial value $α = 1$ and generates output values which ensure that the goal function decreases in each iteration. Consequently, it is appropriate to use Algorithm 1, restated from [21], in order to determine the primary step size $α_{k}$ .

Algorithm 1:

The backtracking line search.

Require: Nonlinear objective function $f x$ , search direction $d_{k}$ , previous point $x_{k}$ , and real quantities $0 < ω < 0.5$ and $φ \in 0, 1$ .

1: $α = 1$ .

2: While $f x_{k} + α d_{k} > f x_{k} + ω α g_{k}^{T} d_{k}$ , do $α ≔ α φ$ .

3: Return $α_{k} = α$ .

Algorithm 2 describes a computational framework for the EDL method.

It is necessary to examine the properties of the EDL method and prove its convergence.

Assumption 1.

(1) The level set $M = x \in ℝ^{n} ∣ f x \leq f x_{0}$ , defined upon the initial point $x_{0}$ of the iterative method (3), is bounded.

(2) The goal function $f$ is continuous and differentiable in a neighborhood $P$ of $M$ with the Lipschitz continuous gradient $g$ . This assumption implies the existence of a positive constant $L > 0$ satisfying

\begin{matrix} (19) & g u - g v \leq L u - v, \forall u, v \in P . \end{matrix}

Assumption 1 initiates the existence of positive constants $D$ and $γ$ satisfying $\begin{matrix} (20) & u - v \leq D, \forall u, v \in P, \\ g u \leq γ, \forall u \in P . \end{matrix}$

The conditions from Assumption 1 are assumed. In view of the uniform convexity of $f$ , there is a constant $θ > 0$ that satisfies $\begin{matrix} (21) & {g u - g v}^{T} u - v \geq θ {u - v}^{2}, for all u, v \in M, \end{matrix}$ or equivalently, $\begin{matrix} (22) & f u \geq f v + g v^{T} u - v + \frac{θ}{2} {u - v}^{2}, for all u, v \in M . \end{matrix}$

Algorithm 2:

Effective Dai-Liao (EDL) CG method.

Require: An initial point $x_{0}$ and quantities $0 < ε < 1$ , $0 < δ < 1$ .

1: Assign $k = 0$ and $d_{0} = - g_{0}$ .

2: If

$g_{k} \leq ε$ and $∣ f x_{k + 1} - f x_{k} ∣ / 1 + f x_{k} \leq δ$ ,

STOP;

else go to Step 3.

3: Calculate $α_{k} \in 0, 1$ using Algorithm 1 (backtracking line search).

4: Compute $x_{k + 1} = x_{k} + α_{k} d_{k}$ .

5: Calculate $g_{k + 1}$ , $y_{k} = g_{k + 1} - g_{k}$ , $s_{k} = x_{k + 1} - x_{k}$ .

6: Compute $t_{k}^{*}$ by (16).

7: Calculate $β_{k + 1}^{EDL}$ by (18).

8: Compute $d_{k + 1} = - g_{k + 1} + β_{k + 1}^{EDL} d_{k}$ .

9: Let $k ≔ k + 1$ , and go to Step 2.

It follows from (21) and (22) that $\begin{matrix} (23) & s_{k - 1}^{T} y_{k - 1} \geq θ {s_{k - 1}}^{2}, \\ (24) & f x_{k - 1} - f x_{k} \geq - g {x_{k}}^{T} s_{k - 1} + \frac{θ}{2} {s_{k - 1}}^{2} . \end{matrix}$

By (19) and (23), one concludes $\begin{matrix} (25) & θ {s_{k - 1}}^{2} \leq s_{k - 1}^{T} y_{k - 1} \leq L {s_{k - 1}}^{2}, \end{matrix}$ where the inequality implies $θ \leq L$ .

The inequality (25) initiates $\begin{matrix} (26) & s_{k - 1}^{T} y_{k - 1} = α_{k - 1} d_{k - 1}^{T} y_{k - 1} > 0 . \end{matrix}$

Taking into account $α_{k - 1} > 0$ and the last inequality, we conclude $\begin{matrix} (27) & d_{k - 1}^{T} y_{k - 1} > 0 . \end{matrix}$

Lemma 2.

[22, 23]. Let Assumption 1 be accomplished and the points $x_{k}$ be generated by the method (3)–(4). Then, it holds $\begin{matrix} (28) & \sum_{k = 0}^{\infty} \frac{∥ g_{k} ∥^{4}}{∥ d_{k} ∥^{2}} < + \infty . \end{matrix}$

Lemma 3.

Consider the proposed Dai-Liao CG method, including (3), (4), and (18). If the search procedure guarantees (27), for all $k \geq 0$ , then the next inequality holds $\begin{matrix} (29) & g_{k}^{T} d_{k} \leq - c {g_{k}}^{2}, \end{matrix}$ for some $0 \leq c \leq 1$ .

Proof.

The inequality (29) will be verified by induction. In the initial situation $k = 0$ , one obtains $g_{0}^{T} d_{0} = - {g_{0}}^{2}$ . Since $c \leq 1$ , obviously (29) is satisfied in the basic case. Suppose that (29) is valid for some $k \geq 1$ . Taking the inner product of both the left- and right-hand sides in (4) with the vector $g_{k}^{T}$ , the following can be obtained: $\begin{matrix} (30) & g_{k}^{T} d_{k} = - {g_{k}}^{2} + β_{k}^{EDL} g_{k}^{T} d_{k - 1} = - {g_{k}}^{2} + \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} - t_{k}^{*} \frac{g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} = - {g_{k}}^{2} + \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} - t_{k}^{*} \frac{g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} = - {g_{k}}^{2} + \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} - t_{k}^{*} \frac{α_{k - 1} {g_{k}^{T} d_{k - 1}}^{2}}{d_{k - 1}^{T} y_{k - 1}} . \end{matrix}$

Using (17) in common with (27) and $α_{k - 1} > 0$ , we conclude $\begin{matrix} (31) & t_{k}^{*} \frac{α_{k - 1} {g_{k}^{T} d_{k - 1}}^{2}}{d_{k - 1}^{T} y_{k - 1}} > 0 . \end{matrix}$

Now from (30), (31), and $\begin{matrix} (32) & 0 \leq β_{k}^{MHS} = \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} \leq \frac{∥ g_{k} ∥^{2}}{λ ∣ g_{k}^{T} d_{k - 1} ∣}, λ \geq 1, \end{matrix}$ it follows that $\begin{matrix} (33) & g_{k}^{T} d_{k} \leq - {g_{k}}^{2} + \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} \leq - {g_{k}}^{2} + \frac{{g_{k}}^{2}}{λ g_{k}^{T} d_{k - 1}} g_{k}^{T} d_{k - 1} = - 1 - \frac{1}{λ} {g_{k}}^{2} . \end{matrix}$

In view of $λ \geq 1$ , the inequality (29) is satisfied for $c = 1 - 1 / λ$ in (33) and arbitrary $k \geq 0$ .

The global convergence of the proposed EDL method is confirmed by Theorem 4.

Theorem 4.

Let Assumption 1 be true and $f$ be uniformly convex. Then, the sequence $x_{k}$ generated by (3), (4), and (18) fulfills $\begin{matrix} (34) & \underset{k \to \infty}{\lim \inf} g_{k} = 0 . \end{matrix}$

Proof.

Suppose the opposite, i.e., (34) is not true. This implies the existence of a constant $c_{1} > 0$ such that $\begin{matrix} (35) & g_{k} \geq c_{1}, for all k . \end{matrix}$

Squaring both sides of (4) implies $\begin{matrix} (36) & {d_{k}}^{2} = {g_{k}}^{2} - 2 β_{k}^{EDL} g_{k}^{T} d_{k - 1} + {β_{k}^{EDL}}^{2} {d_{k - 1}}^{2} . \end{matrix}$

Taking into account (18), we can get $\begin{matrix} (37) & - 2 β_{k}^{EDL} g_{k}^{T} d_{k - 1} = - 2 \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} - t_{k}^{*} \frac{g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} = - 2 \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} - t_{k}^{*} \frac{α_{k - 1} {g_{k}^{T} d_{k - 1}}^{2}}{d_{k - 1}^{T} y_{k - 1}} . \end{matrix}$

Now from (31) and (32), it follows that $\begin{matrix} (38) & - 2 β_{k}^{EDL} g_{k}^{T} d_{k - 1} \leq 2 \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} g_{k}^{T} d_{k - 1} \leq 2 \frac{∥ g_{k} ∥^{2}}{λ g_{k}^{T} d_{k - 1}} g_{k}^{T} d_{k - 1} = 2 \frac{∥ g_{k} ∥^{2}}{λ} . \end{matrix}$

Now, an application of (18) initiates $\begin{matrix} (39) & β_{k}^{EDL} = \frac{{g_{k}}^{2} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1} - t_{k}^{*} g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} \leq \frac{g_{k}^{T} g_{k} - g_{k} / g_{k - 1} g_{k}^{T} g_{k - 1} - t_{k}^{*} g_{k}^{T} s_{k - 1}}{d_{k - 1}^{T} y_{k - 1}} \leq \frac{g_{k}^{T} g_{k} - g_{k} / g_{k - 1} g_{k - 1} - t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} = \frac{g_{k}^{T} g_{k} - g_{k - 1} + g_{k - 1} - g_{k} / g_{k - 1} g_{k - 1} - t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} \leq \frac{g_{k} g_{k} - g_{k - 1} + g_{k - 1} 1 - g_{k} / g_{k - 1} + t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} = \frac{g_{k} g_{k} - g_{k - 1} + 1 - g_{k} / g_{k - 1} g_{k - 1} + t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} = \frac{g_{k} g_{k} - g_{k - 1} + g_{k - 1} - g_{k} + t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} \leq \frac{g_{k} g_{k} - g_{k - 1} + g_{k - 1} - g_{k} + t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} = \frac{g_{k} 2 g_{k} - g_{k - 1} + t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} \leq \frac{g_{k} 2 L s_{k - 1} + t_{k}^{*} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} = \frac{2 L + t_{k}^{*} g_{k} s_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} = \frac{2 L + t_{k}^{*} g_{k} α_{k - 1} d_{k - 1}}{θ α_{k - 1} {d_{k - 1}}^{2}} = \frac{2 L + t_{k}^{*} g_{k}}{θ d_{k - 1}} . \end{matrix}$

Using $t_{k}^{*} \in 0, 1$ and (38) and (39) in (36), we obtain $\begin{matrix} (40) & ∥ d_{k} ∥^{2} \leq ∥ g_{k} ∥^{2} + 2 \frac{∥ g_{k} ∥^{2}}{λ} + \frac{{2 L + t_{k}^{*}}^{2} {g_{k}}^{2}}{θ^{2} {d_{k - 1}}^{2}} ∥ d_{k - 1} ∥^{2} = ∥ g_{k} ∥^{2} + 2 \frac{∥ g_{k} ∥^{2}}{λ} + \frac{{2 L + t_{k}^{*}}^{2}}{θ^{2}} {g_{k}}^{2} = 1 + \frac{2}{λ} + \frac{{2 L + t_{k}^{*}}^{2}}{θ^{2}} {g_{k}}^{2} = \frac{λ + 2}{λ} + \frac{{2 L + t_{k}^{*}}^{2}}{θ^{2}} {g_{k}}^{2} = \frac{λ + 2 θ^{2} + λ {2 L + t_{k}^{*}}^{2}}{λ θ^{2}} {g_{k}}^{2} . \end{matrix}$

Next, dividing both sides of (40) by ${g_{k}}^{4}$ and using (35), it can be concluded that $\begin{matrix} (41) & \frac{∥ d_{k} ∥^{2}}{∥ g_{k} ∥^{4}} \leq \frac{λ + 2 θ^{2} + λ {2 L + t_{k}^{*}}^{2}}{λ θ^{2}} \cdot \frac{1}{c_{1}^{2}}, \\ \frac{∥ g_{k} ∥^{4}}{∥ d_{k} ∥^{2}} \geq \frac{λ θ^{2} \cdot c_{1}^{2}}{λ + 2 θ^{2} + λ {2 L + t_{k}^{*}}^{2}} . \end{matrix}$

The inequalities in (41) imply $\begin{matrix} (42) & \sum_{k = 0}^{\infty} \frac{{g_{k}}^{4}}{{d_{k}}^{2}} \geq \sum_{k = 0}^{\infty} \frac{λ θ^{2} \cdot c_{1}^{2}}{λ + 2 θ^{2} + λ {2 L + t_{k}^{*}}^{2}} = \infty . \end{matrix}$

Therefore, $g_{k} \geq c_{1}$ causes a contradiction with Lemma 2.

3. Numerical Experiments

The implementation of the EDL method is based on Algorithm 2. This section is intended to analyze and compare the numerical results obtained by the EDL method and four variants of the MHSDL class methods (6). These variants are defined by $t \equiv t_{k 3}$ , $t \equiv t_{k 4}$ , $t \equiv t_{k 5}$ , and $t \equiv t_{k 6}$ and denoted, respectively, as MHSDL3, MHSDL4, MHSDL5, and MHSDL6. The obtained results are not compared with the values $t_{k 1}$ and $t_{k 2}$ , because in [16], the authors have already shown that $t_{k 3}$ and $t_{k 4}$ initiate better numerical performances compared to $t_{k 1}$ and $t_{k 2}$ .

The codes used in the testing experiments for the above methods are written in MATLAB R2017a and executed on the Intel Core i3 2.0 GHz workstation with the Windows 10 operating system. Three important criteria are analyzed in each individual test case: number of iterations (NI), number of function evaluations (NFE), and processor time (CPU).

The numerical experiment is performed using 28 test functions presented in [24], where much of the problems are taken over from the CUTEr collection [25]. All methods used in the testing of an arbitrary objective function start from the same initialization $x_{0}$ . Each function is tested 10 times with gradually increasing dimensions $n = 100$ , 500, 1000, 3000, 5000, 7000, 8000, 10000, 15000, and 20000.

The uniform terminating criteria for each of the five considered algorithms (EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6) are $\begin{matrix} (43) & g_{k} \leq ε, \\ \frac{f x_{k + 1} - f x_{k}}{1 + f x_{k}} \leq δ, \end{matrix}$ where $ε = 1 0^{- 6}$ and $δ = 1 0^{- 16}$ . The backtracking line search is based on the parameters $ω = 0.0001$ and $φ = 0.8$ for all five algorithms. Specific parameters used only in the MHSDL6 method are defined as $C = 1$ , $υ = 0.26$ , and $r = r_{k} = υ ∥ g_{k - 1} ∥$ .

Summary numerical results for EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods, executed on 28 test functions, are arranged in Tables 1–3. Tables 1–3 show the numerical outcomes corresponding to all three criteria (NI, NFE, and CPU) for the EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods.

Table 1

Summary results of EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods with respect to NI.

Test function	MHSDL3	MHSDL4	MHSDL5	EDL	MHSDL6
Extended penalty	1466	2243	2231	1259	1371
Perturbed quadratic	1203710	754291	746557	305622	423037
Raydan 1	159055	110587	106586	55477	75154
Raydan 2	1636	441	441	70	209
Diagonal 1	116788	78844	73512	30978	20332
Diagonal 2	176983	270434	271595	515000	271295
Diagonal 3	150328	98647	104417	47155	37711
Hager	8666	5219	5157	3234	3625
Generalized tridiagonal 1	1862	1471	1485	639	877
Extended TET	1357	5954	5915	4030	2664
Diagonal 4	30693	19589	19332	8040	12012
Diagonal 5	1721	25120	25120	60	216
Extended Himmelblau	1777	8023	7946	1376	3682
Perturbed quadratic diagonal	2940970	2115659	2027128	1136414	1352704
Quadratic QF1	1270802	799192	786032	309509	325415
Extended quadratic penalty QP1	770	594	575	560	543
Extended quadratic penalty QP2	399671	240530	245254	96620	137799
Extended quadratic exponential EP1	462	606	606	513	526
Extended tridiagonal 2	3119	2176	2177	1132	1455
ARWHEAD (CUTE)	88824	69868	67413	40713	48669
ENGVAL1 (CUTE)	2323	1407	1415	552	820
INDEF (CUTE)	20	31	1080	23	36240
QUARTC (CUTE)	173913	262291	262291	524299	262181
Diagonal 6	1824	508	508	70	227
Generalized quartic	1208	1403	2846	1265	1154
Diagonal 7	3217	655	655	653	580
Diagonal 8	511	698	698	686	596
Full Hessian FH3	1456	5353	5350	2523	3176

Table 2

Summary results of EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods with respect to NFE.

Test function	MHSDL3	MHSDL4	MHSDL5	EDL	MHSDL6
Extended penalty	54876	73764	73429	46820	49791
Perturbed quadratic	56691737	34287604	33885701	13168688	18486375
Raydan 1	5066739	3364983	3236335	1551846	2170553
Raydan 2	6554	1162	1162	159	428
Diagonal 1	5004640	3256274	3022015	1200086	744278
Diagonal 2	353976	540878	543200	1030010	542600
Diagonal 3	6339146	3998904	4229565	1798032	1400076
Hager	192474	107413	106534	59187	69735
Generalized tridiagonal 1	37429	27860	28138	10760	15177
Extended TET	19546	77422	76925	40340	29334
Diagonal 4	713120	425023	418666	155027	242443
Diagonal 5	6874	50460	50460	140	442
Extended Himmelblau	45972	192362	190524	26104	80854
Perturbed quadratic diagonal	135901222	94177165	90238441	48147512	57702654
Quadratic QF1	55972697	33836473	33243711	12316721	12853424
Extended quadratic penalty QP1	17016	12882	12565	11116	10544
Extended quadratic penalty QP2	13015888	7454686	7584960	2743358	4030601
Extended quadratic exponential EP1	14914	18463	18463	14132	15133
Extended tridiagonal 2	36450	22564	22379	9687	12920
ARWHEAD (CUTE)	4296028	3305257	3182138	1846606	2230650
ENGVAL1 (CUTE)	40462	22432	22898	8209	12858
INDEF (CUTE)	1808	2182	5995	2060	104962
QUARTC (CUTE)	347926	524662	524662	1048648	524422
Diagonal 6	7394	1416	1408	159	468
Generalized quartic	14364	21842	48770	16695	14103
Diagonal 7	6454	6838	6838	3891	4521
Diagonal 8	6098	6938	6938	4161	5494
Full Hessian FH3	60792	212799	212701	89890	114962

Table 3

Summary results of EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods with respect to CPU time (sec).

Test function	MHSDL3	MHSDL4	MHSDL5	EDL	MHSDL6
Extended penalty	29.75	34.11	31.42	18.30	24.27
Perturbed quadratic	40532.66	24358.20	24947.84	8335.80	13225.80
Raydan 1	3054.67	1904.48	1692.06	690.91	1184.86
Raydan 2	6.77	1.58	1.66	0.31	0.77
Diagonal 1	7834.03	5106.41	4592.28	1476.89	486.09
Diagonal 2	885.13	1428.05	1447.02	2352.11	1513.50
Diagonal 3	13614.27	8416.77	9064.30	3132.02	1916.30
Hager	586.63	325.75	333.41	142.06	198.13
Generalized tridiagonal 1	66.14	35.59	34.42	15.19	21.63
Extended TET	20.50	78.34	82.94	41.23	31.45
Diagonal 4	134.53	77.86	87.88	30.41	55.34
Diagonal 5	18.06	134.73	121.09	0.56	1.84
Extended Himmelblau	11.13	44.47	44.36	6.19	18.30
Perturbed quadratic diagonal	91655.55	58226.16	60920.06	32179.38	36383.83
Quadratic QF1	62610.50	31552.48	28679.91	8832.11	8465.34
Extended quadratic penalty QP1	7.56	7.25	6.98	4.98	4.94
Extended quadratic penalty QP2	3814.16	2128.86	2288.55	671.52	1204.72
Extended quadratic exponential EP1	9.11	10.23	8.55	8.00	8.02
Extended tridiagonal 2	11.13	8.83	6.95	4.08	5.25
ARWHEAD (CUTE)	2709.42	2336.92	2369.28	1266.80	1689.80
ENGVAL1 (CUTE)	19.47	11.33	11.81	4.03	6.70
INDEF (CUTE)	2.44	2.89	10.70	1.92	774.34
QUARTC (CUTE)	3106.56	4818.58	4808.70	7138.72	4735.39
Diagonal 6	6.75	1.92	2.03	0.38	1.34
Generalized quartic	7.16	11.53	21.05	7.53	9.78
Diagonal 7	5.98	8.20	8.28	4.56	6.25
Diagonal 8	6.17	8.20	8.08	4.72	7.69
Full Hessian FH3	30.08	66.45	79.48	35.77	43.42

We utilized the performance profile given in [26] to compare numerical results for three criteria (NI, NFE, and CPU) generated by five methods (EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6). The upper curve of the selected performance profile corresponds to the method that shows the best performance.

Figures 1–3 plot the performance profiles for the numerical values included in Tables 1–3, respectively. Figure 1 presents the performance profiles of the NI criterion generated by the EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods. In this figure, it is noticeable that EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods solved all tested functions, wherein the EDL method shows the best performances in 57.14% of test functions compared with MHSDL3 (25.00%), MHSDL4 (0.00%), MHSDL5 (0.00%), and MHSDL6 (17.86%). From Figure 1, it is observable that the graph of the EDL method comes first to the top, which means that the EDL outperforms other considered methods with respect to the NI.

[figure omitted; refer to PDF]

Figure 2 presents the performance profiles of the NFE of the EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods. It is observable that EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 generated solutions to all tested cases, and the EDL method is the best in 67.86% of the functions compared with MHSDL3 (17.86%), MHSDL4 (0.00%), MHSDL5 (0.00%), and MHSDL6 (14.28%). From Figure 2, it is observed that the EDL graph first comes to the top, which confirms that the EDL is the winner with respect to the NFE.

Figure 3 contains graphs of the performance profiles corresponding to the CPU time of the EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 methods. It is obvious that EDL, MHSDL3, MHSDL4, MHSDL5, and MHSDL6 solved all tested functions. Further analysis gives that the EDL method is the winner in 67.86% of the test cases compared with MHSDL3 (17.86%), MHSDL4 (0.00%), MHSDL5 (0.00%), and MHSDL6 (14.28%). Figure 3 demonstrates that the graph of the EDL method first comes to level 1, which indicates its superiority with respect to the CPU time.

From the previous analysis of the results shown in Tables 1–3 and Figures 1–3, it can be concluded that the EDL method produces superlative results in terms of all three basic metrics: NI, NFE, and CPU.

4. Conclusion

A novel rule which determines the value $t k$ of the parameter $t$ in each iteration of the Dai-Liao-type CG method is presented. The proposed expression for defining $t k$ is denoted by $t_{k}^{*}$ . Considering $t = t_{k}^{*}$ in (6), a novel variant of the Dai-Liao CG parameter $β_{k}^{EDL}$ is defined and a novel Effective Dai-Liao (EDL) conjugate gradient method is proposed. The convergence of the EDL method is investigated, and the global convergence on a class of uniformly convex functions is established. By numerical testing, we have shown that there is a significant influence of the scalar size of $t_{k}^{*}$ on the convergence speed of the EDL method. Numerical comparisons on large-scale unconstrained optimization test functions of different structures and complexities confirm the computational efficiency of the algorithm EDL and its superiority over the previously known DL CG variants, such as MHSDL3, MHSDL4, MHSDL5, and MHSDL6. During the testing, we tracked the number of iterations (NI), number of function evaluations (NFE), and spanned processor time (CPU) performances for each function and each method. Analysis of the obtained performance profiles introduced by Dolan and Moré revealed that the EDL method is the most efficient.

We are convinced that the obtained results will be a motivation for further research in defining new values of the parameter $t_{k}$ in the Dai-Liao CG methods. Future research would include research in finding some more efficient rules to calculate the parameter $t_{k}$ during the iterative process. We hope that our proposal of the new expression for defining the parameter $t$ will initiate further research in that direction. It is evident that finding novel approaches in defining different values of $t$ and the conjugate gradient parameter $β_{k}$ is an inexhaustible topic for scientific research, and our approach is only one possible direction in this research.

Acknowledgments

The research was supported by the National Natural Science Foundation of China (Grant Nos. 11971142, 11871202, 61673169, 11701176, 11626101, and 11601485).

References

[1] Y. -H. Dai, L. -Z. Liao, "New conjugacy conditions and related nonlinear conjugate gradient methods," Applied Mathematics and Optimization, vol. 43 no. 1, pp. 87-101, DOI: 10.1007/s002450010019, 2001.

[2] Y. Cheng, Q. Mou, X. Pan, S. Yao, "A sufficient descent conjugate gradient method and its global convergence," Optimization Methods and Software, vol. 31 no. 3, pp. 577-590, DOI: 10.1080/10556788.2015.1124431, 2016.

[3] I. E. Livieris, P. Pintelas, "A descent Dai-Liao conjugate gradient method based on a modified secant equation and its global convergence," ISRN Computational Mathematics, vol. 2012,DOI: 10.5402/2012/435495, 2012.

[4] M. R. Peyghami, H. Ahmadzadeh, A. Fazli, "A new class of efficient and globally convergent conjugate gradient methods in the Dai-Liao family," Optimization Methods and Software, vol. 30 no. 4, pp. 843-863, DOI: 10.1080/10556788.2014.1001511, 2015.

[5] H. Yabe, M. Takano, "Global convergence properties of nonlinear conjugate gradient methods with modified secant condition," Computational Optimization and Applications, vol. 28 no. 2, pp. 203-225, DOI: 10.1023/B:COAP.0000026885.81997.88, 2004.

[6] S. Yao, B. Qin, "A hybrid of DL and WYL nonlinear conjugate gradient methods," Abstract and Applied Analysis, vol. 2014,DOI: 10.1155/2014/279891, 2014.

[7] S. Yao, X. Lu, Z. Wei, "A conjugate gradient method with global convergence for large-scale unconstrained optimization problems," Journal of Applied Mathematics, vol. 2013,DOI: 10.1155/2013/730454, 2013.

[8] Y. Zheng, B. Zheng, "Two new Dai-Liao-type conjugate gradient methods for unconstrained optimization problems," Journal of Optimization Theory and Applications, vol. 175 no. 2, pp. 502-509, DOI: 10.1007/s10957-017-1140-1, 2017.

[9] W. Zhou, L. Zhang, "A nonlinear conjugate gradient method based on the MBFGS secant condition," Optimization Methods and Software, vol. 21 no. 5, pp. 707-714, DOI: 10.1080/10556780500137041, 2006.

[10] W. Hu, J. Wu, G. Yuan, "Some modified Hestenes-Stiefel conjugate gradient algorithms with application in image restoration," Applied Numerical Mathematics, vol. 158, pp. 360-376, DOI: 10.1016/j.apnum.2020.08.009, 2020.

[11] G. Yuan, T. Li, W. Hu, "A conjugate gradient algorithm for large-scale nonlinear equations and image restoration problems," Applied Numerical Mathematics, vol. 147, pp. 129-141, DOI: 10.1016/j.apnum.2019.08.022, 2020.

[12] N. Andrei, "Open problems in nonlinear conjugate gradient algorithms for unconstrained optimization," Bulletin of the Malaysian Mathematical Sciences Society, vol. 34 no. 2, pp. 319-330, 2011.

[13] W. W. Hager, H. Zhang, "A new conjugate gradient method with guaranteed descent and an efficient line search," SIAM Journal on Optimization, vol. 16 no. 1, pp. 170-192, DOI: 10.1137/030601880, 2005.

[14] W. W. Hager, H. Zhang, "Algorithm 851," ACM Transactions on Mathematical Software, vol. 32 no. 1, pp. 113-137, DOI: 10.1145/1132973.1132979, 2006.

[15] Y. -H. Dai, C. -X. Kou, "A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search," SIAM Journal on Optimization, vol. 23 no. 1, pp. 296-320, DOI: 10.1137/100813026, 2013.

[16] S. Babaie-Kafaki, R. Ghanbari, "The Dai-Liao nonlinear conjugate gradient method with optimal parameter choices," European Journal of Operational Research, vol. 234 no. 3, pp. 625-630, DOI: 10.1016/j.ejor.2013.11.012, 2014.

[17] N. Andrei, "A Dai-Liao conjugate gradient algorithm with clustering of eigenvalues," Numerical Algorithms, vol. 77 no. 4, pp. 1273-1282, DOI: 10.1007/s11075-017-0362-5, 2018.

[18] M. Lotfi, S. M. Hosseini, "An efficient Dai-Liao type conjugate gradient method by reformulating the CG parameter in the search direction equation," Journal of Computational and Applied Mathematics, vol. 371, article 112708,DOI: 10.1016/j.cam.2019.112708, 2020.

[19] X. Li, Q. Ruan, "A modified PRP conjugate gradient algorithm with trust region for optimization problems," Numerical Functional Analysis and Optimization, vol. 32 no. 5, pp. 496-506, DOI: 10.1080/01630563.2011.554948, 2011.

[20] N. Andrei, "An acceleration of gradient descent algorithm with backtracking for unconstrained optimization," Numerical Algorithms, vol. 42 no. 1, pp. 63-73, DOI: 10.1007/s11075-006-9023-9, 2006.

[21] P. S. Stanimirovic, M. B. Miladinovic, "Accelerated gradient descent methods with line search," Numerical Algorithms, vol. 54 no. 4, pp. 503-520, DOI: 10.1007/s11075-009-9350-8, 2010.

[22] W. Cheng, "A two-term PRP-based descent method," Numerical Functional Analysis and Optimization, vol. 28 no. 11–12, pp. 1217-1230, DOI: 10.1080/01630560701749524, 2007.

[23] G. Zoutendijk, "Nonlinear programming, computational methods," Integer and Nonlinear Programming, North-Holland, pp. 37-86, 1970.

[24] N. Andrei, "An unconstrained optimization test functions collection," Advanced Modeling and Optimization, vol. 10 no. 1, pp. 147-161, 2008.

[25] I. Bongartz, A. R. Conn, N. Gould, P. L. Toint, "CUTE: constrained and unconstrained testing environments," ACM Transactions on Mathematical Software, vol. 21 no. 1, pp. 123-160, DOI: 10.1145/200979.201043, 1995.

[26] E. D. Dolan, J. J. Moré, "Benchmarking optimization software with performance profiles," Mathematical Programming, vol. 91 no. 2, pp. 201-213, DOI: 10.1007/s101070100263, 2002.

Word count: 3532

Show less

Copyright © 2021 Branislav Ivanov et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

A new rule for calculating the parameter $t$ involved in each iteration of the MHSDL (Dai-Liao) conjugate gradient (CG) method is presented. The new value of the parameter initiates a more efficient and robust variant of the Dai-Liao algorithm. Under proper conditions, theoretical analysis reveals that the proposed method in conjunction with backtracking line search is of global convergence. Numerical experiments are also presented, which confirm the influence of the new value of the parameter $t$ on the behavior of the underlying CG optimization method. Numerical comparisons and the analysis of obtained results considering Dolan and Moré’s performance profile show better performances of the novel method with respect to all three analyzed characteristics: number of iterative steps, number of function evaluations, and CPU time.

Details

Title

A Novel Value for the Parameter in the Dai-Liao-Type Conjugate Gradient Method

Author

Ivanov, Branislav¹

; Stanimirović, Predrag S²

; Shaini, Bilall I³; Hijaz Ahmad⁴

; Miao-Kun, Wang⁵

¹ Technical Faculty in Bor, University of Belgrade, Vojske Jugoslavije 12, 19210 Bor, Serbia
² Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18000 Niš, Serbia
³ University of Tetovo, St. Ilinden, n.n., Tetovo, North Macedonia
⁴ Department of Basic Sciences, University of Engineering and Technology Peshawar, Pakistan
⁵ Department of Mathematics, Huzhou University, Huzhou 313000, China

Editor

Ioan Rasa

Publication year

2021

Publication date

2021

Publisher

John Wiley & Sons, Inc.

ISSN

23148896

e-ISSN

23148888

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2021/6693401

ProQuest document ID

2487053123

A Novel Value for the Parameter in the Dai-Liao-Type Conjugate Gradient Method

Jump to:

Full text

Abstract

Details

Suggested sources