Full Text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

The main goal herein is to derive an efficient optimization method for minimization of an objective function $f : R^{n} \to R$ . Therewith, we assume the function $f$ is uniformly convex and twice continuously differentiable. Furthermore, for the gradient and the Hessian of the function $f$ at the $k$ -th iterative point we use the next notation: $\begin{matrix} (1) & g_{k} (x) = ▽ f (x_{k}), \\ G_{k} (x) = ▽^{2} f (x_{k}) . \end{matrix}$ The general form of the iterations for finding the extreme values of the objective function $f$ is given by the next expression: $\begin{matrix} (2) & x_{k + 1} = x_{k} + t_{k} d_{k}, \end{matrix}$ where $x_{k}$ is the current, $x_{k + 1}$ is the next iterative point, $t_{k}$ is iterative step length value, and $d_{k}$ is an iterative vector direction which leads us to the solution of the problem. Certainly, $t_{k}$ and $d_{k}$ are the most important issues of an optimization model (2) and they generate the efficiency of a relevant method. For that reason, the way of defining these two crucial elements is of great importance for each minimization scheme.

In one of the first algorithms for solving unconstrained optimization problems, denoted as the steepest descent gradient method which is exposed by Cauchy, the iteration is defined as $\begin{matrix} (3) & x_{k + 1} = x_{k} - t_{k} g_{k}, \end{matrix}$ and here, the descent direction is simply presented as the negative gradient vector, while iterative step size value is calculated by the exact line search formula: $\begin{matrix} (4) & t_{k} = \arg \min_{t > 0} f (x_{k} + t d_{k}) . \end{matrix}$

Furthermore, in the general Newton method $\begin{matrix} (5) & x_{k + 1} = x_{k} - G_{k}^{- 1} g_{k} t_{k}, \end{matrix}$ the vector direction is calculated as the product of the inverse Hessian and the gradient of the objective function. Defining the vector direction in this way guarantees fast convergence properties, but still, practical computing of the function Hessian and its inverse can be difficult. And so, many modified Newton, Newton-conjugate, and quasi Newton schemes were developed in which the calculation of the Hessian and its inverse is, somehow, avoided.

In the quasi-Newton methods the Hessian of the goal function or its inverse is approximated by the adequately defined matrix. Using this type of methods we generally reduce the time of computations since we avoid the complicated calculations in deriving the Hessian of the objective function. Nevertheless, the methods of quasi Newton type preserve good properties of the Newton method. For these reasons, in this paper, we propose the method of quasi Newton type where the value of the iterative step size parameter $t_{k}$ is obtained by the inexact Backtracking line search procedure.

In the second section we give an overview of some accelerated gradient methods and hybrid iterations. We elaborate the deriving of the hybrid accelerated double direction method and restate the algorithm in the third section of this paper. In the fourth section we give a convergence analysis regarding the proposed iteration. Numerical experiments and comparison are presented in the last section of this paper.

2. Preliminaries: Accelerated Gradient Methods and Hybrid Iterations

The authors in [1] rightfully detected a class of accelerated gradient descent methods, defined by the general iterative scheme $\begin{matrix} (6) & x_{k + 1} = x_{k} - γ_{k}^{- 1} t_{k} g_{k} . \end{matrix}$ In the previous expression, $γ_{k}$ presents an iterative acceleration parameter which improves performance of the relevant method. A common way to determine this parameter is through the features of the second-order Taylor’s series taken on appropriate scheme (6). Acceleration parameters that were computed in such way are applied in the methods described in [1–5]. According to the iteration form (6), we can conclude that the accelerated gradient methods are of the quasi-Newton type in which the approximation of the Hessian, i.e., its inverse, is obtained by the scalar matrix $γ_{k} I$ , where $I$ is appropriate identity matrix and $γ_{k} = γ (x_{k}, x_{k - 1})$ is the matching acceleration parameter. Here are several expressions for defining the acceleration parameters of some accelerated gradient schemes: $\begin{matrix} (7) & γ_{k + 1}^{S M} = 2 γ_{k} \frac{γ_{k} [f (x_{k + 1}) - f (x_{k})] + t_{k} {‖g_{k}‖}^{2}}{t_{k}^{2} {‖g_{k}‖}^{2}} . \end{matrix}$ (SM method [1]) $\begin{matrix} (8) & γ_{k + 1}^{A D D} = 2 \frac{f (x_{k + 1}) - f (x_{k}) - α_{k} g_{k}^{T} (α_{k} d_{k} - γ_{k}^{- 1} g_{k})}{{(α_{k} d_{k} - γ_{k}^{- 1} g_{k})}^{T} (α_{k} d_{k} - γ_{k}^{- 1} g_{k})}, \end{matrix}$ (ADD method [4]) $\begin{matrix} (9) & γ_{k + 1}^{A D S S} = 2 \frac{f (x_{k + 1}) - f (x_{k}) + (α_{k} {γ_{k}}^{- 1} + β_{k}) {‖g_{k}‖}^{2}}{{(α_{k} {γ_{k}}^{- 1} + β_{k})}^{2} {‖g_{k}‖}^{2}}, \end{matrix}$ (ADSS method [2]) $\begin{matrix} (10) & γ_{k + 1}^{T A D S S} = 2 \frac{f (x_{k + 1}) - f (x_{k}) + ψ_{k} {‖g_{k}‖}^{2}}{ψ_{k}^{2} {‖g_{k}‖}^{2}}, \end{matrix}$ (TADSS method [5]) $\begin{matrix} (11) & γ_{k + 1}^{H S M} = 2 γ_{k} \frac{γ_{k} [f (x_{k + 1}) - f (x_{k})] + (α_{k} + 1) t_{k} {‖g_{k}‖}^{2}}{{(α_{k} + 1)}^{2} t_{k}^{2} {‖g_{k}‖}^{2}}, \end{matrix}$ (HSM method [3]).

An interesting concept of merging iterations through the hybrid expression was suggested in some research articles (see [6–8]). Some of representations are given by the next set of iterations: $\begin{matrix} (12) & u_{1} = u \in C, \\ u_{k + 1} = T u_{k}, k \in N, \\ (13) & v_{1} = v \in C, \\ v_{k + 1} = (1 - α_{k}) v_{k} + α_{k} T v_{k}, k \in N, \\ (14) & z_{1} = z \in C, \\ z_{k + 1} = (1 - α_{k}) z_{k} + α_{k} T y_{k}, \\ y_{k} = (1 - β_{k}) z_{k} + β_{k} T z_{k}, k \in N, \end{matrix}$ where $T : C \to C$ is a mapping defined on nonempty convex subset $C$ of a normed space $E$ , $v_{k}, z_{k}$ , and $y_{k}$ present the sequences defined by proposed iterations, and ${α_{k}}, {β_{k}} \in (0,1)$ .

In [9] it was proved that the hybrid method $\begin{matrix} (15) & x_{1} = x \in R, \\ x_{k + 1} = T y_{k}, \\ y_{k} = (1 - α_{k}) x_{k} + α_{k} T x_{k}, k \in N, \end{matrix}$ proposed by Picard, Mann, and Ishikawa, upgrades the hybrid models mentioned above. The authors of [3] used the advantages of the hybrid model (15) and derived a hybrid version of the accelerated gradient SM method from [1], termed the HSM method and defined by $\begin{matrix} (16) & x_{k + 1} = x_{k} - (α_{k} + 1) t_{k} γ_{k}^{- 1} g_{k} . \end{matrix}$ Numerical tests from [3] confirmed that the hybrid model (16) upgrades its forerunner SM iterative rule.

3. HADD Algorithm

We are motivated by the confirmed advantages which were approved in [3] when the scheme (15) was applied on the SM method. As a result, the hybrid SM model (called HSM) was defined and tested in [3]. Herein, we apply the same hybridization strategy to the accelerated double direction method (ADD method, shortly), introduced in [4]. Derived scheme will be based on the hybrid scheme (15) and with that it keeps the accelerated features of the ADD iterations.

In order to complete the presentation, we start from the ADD iteration: $\begin{matrix} (17) & x_{k + 1} = x_{k} - t_{k} {γ_{k}}^{- 1} g_{k} + t_{k}^{2} d_{k}, \end{matrix}$ where $t_{k}$ is appropriately defined step size, the first direction vector is given by $- {γ_{k}}^{- 1} g_{k} = - {(γ_{k}^{A D D})}^{- 1} g_{k}$ , and the second one, $d_{k}$ , is determined based on the next procedure Second direction. That procedure was introduced in [4], which was derived as a practical appearance of the more general procedure considered in [10]. The procedure Second direction is restated in Algorithm 1.

Algorithm 1: Procedure Second direction (calculation of the second direction vector $d_{k}$ ).

Require: Objective function $f (x)$ , gradient $g_{k}$ and stepsize $t_{k}$ .

1: Compute

$d_{k} (t) = \{\begin{cases} d_{k}^{*} & i f k \leq m - 1 \\ \sum_{i = 2}^{m} ‍ t^{i - 1} d_{k - i + 1}^{*} & i f k \geq m, \end{cases}$

where $t = t_{k}$ is the step size, $d_{k}^{*}$ is the solution of the problem ${m i n}_{x \in R} Φ_{k} (d)$ , and

$Φ_{k} (d) = \nabla f {(x_{k})}^{T} d + \frac{1}{2} γ_{k + 1} I = g_{k}^{T} d + \frac{1}{2} γ_{k + 1} I .$

2: Return $d_{k} = d_{k}^{*}$ .

Remark 1.

For further investigation within this topic, the second direction $d_{k}$ in the ADD iteration can be defined differently. For example, in [11] the authors proposed directional k-step Newton methods for solving a single nonlinear equation in n-variables. Accordingly, they established the semi-local convergence analysis for these models, based on two different approaches. The first one is based on recurrent relations, while the other, more preferable, is established using recurrent functions. Using one (or both) approaches from [11] in determining the second direction $d_{k}$ in the ADD method as well as in its hybrid version can be an interesting topic in further research.

Applying the hybrid scheme (15) on the iterative rule (17), we get the hybrid iterative scheme $\begin{matrix} (18) & x_{1} = x \in R, \\ x_{k + 1} = T y_{k} = y_{k} - t_{k} γ_{k}^{- 1} g_{k} + t_{k}^{2} d_{k}, \\ y_{k} = (1 - α_{k}) x_{k} + α_{k} T x_{k} = (1 - α_{k}) x_{k} + α_{k} (x_{k} - t_{k} γ_{k}^{- 1} g_{k} + t_{k}^{2} d_{k}) \\ = x_{k} - α_{k} t_{k} γ_{k}^{- 1} g_{k} + α_{k} t_{k}^{2} d_{k}, k \in N . \end{matrix}$ After replacing the third expression from the set of equations (18) into the second one, the next iterative rule follows: $\begin{matrix} (19) & x_{k + 1} = x_{k} - (α_{k} + 1) t_{k} γ_{k}^{- 1} g_{k} + (α_{k} + 1) t_{k}^{2} d_{k} . \end{matrix}$

To simplify further calculation, we will use a constant value for the parameter $α_{k} \in (0,1)$ in (19), just like the authors did in [1, 9]. So, in (19), instead of $α_{k} + 1 \in (1,2)$ we simply take $α \in (1,2)$ . Now we can restate a hybrid ADD method, or the HADD iterative scheme, as follows: $\begin{matrix} (20) & x_{k + 1} = x_{k} - α t_{k} γ_{k}^{- 1} g_{k} + α t_{k}^{2} d_{k}, α \in (1,2) . \end{matrix}$

Yet, we need to determine the iterative value of the accelerated parameter $γ_{k} = γ_{k}^{H A D D} .$ As we mentioned previously, this parameter can be appropriately defined using Taylor’s expansion of the proposed iteration (20) in two successive iterative points: $\begin{matrix} (21) & f (x_{k + 1}) \approx f (x_{k}) + g_{k}^{T} α (t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k}) + \frac{1}{2} α^{2} {(t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k})}^{T} \nabla^{2} f (ξ) (t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k}) . \end{matrix}$ The parameter $ξ$ in the previous expansion fulfills the condition $\begin{matrix} (22) & ξ \in [x_{k}, x_{k + 1}], \\ ξ = x_{k} + κ (x_{k + 1} - x_{k}) = x_{k} + κ (α t_{k}^{2} d_{k} - α t_{k} γ_{k}^{- 1} g_{k}), 0 \leq κ \leq 1 . \end{matrix}$ In the next relation we substitute the value $▽^{2} f (ξ)$ from (21) by the scalar matrix $γ_{k + 1} I$ , which leads to $\begin{matrix} (23) & f (x_{k + 1}) = f (x_{k}) + g_{k}^{T} α (t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k}) + \frac{1}{2} α^{2} γ_{k + 1} {(t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k})}^{T} (t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k}) . \end{matrix}$ From (23), it is possible to derive the approximation factor $γ_{k + 1} = γ_{k + 1}^{H A D D}$ of the HADD scheme: $\begin{matrix} (24) & γ_{k + 1}^{H A D D} = γ_{k + 1} = 2 \frac{f (x_{k + 1}) - f (x_{k}) - α g_{k}^{T} (t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k})}{α^{2} t_{k}^{2} {(t_{k} d_{k} - γ_{k}^{- 1} g_{k})}^{T} (t_{k} d_{k} - γ_{k}^{- 1} g_{k})} . \end{matrix}$

With the aim of preserving the Second-Order Necessary Condition and Second-Order Sufficient Condition, we assume positivity of the acceleration parameter: $γ_{k + 1} > 0 .$ In practical computation, it is possible that (24) generates negative value for $γ_{k + 1} .$ We resolve this situation by taking $γ_{k + 1} = 1$ in such cases. As a consequence, then the first vector direction becomes the negative gradient vector $- g_{k} .$ In this special case, the next iterative point of the iteration (20) becomes $\begin{matrix} (25) & x_{k + 2} = x_{k + 1} - α t_{k + 1} g_{k + 1} + α t_{k + 1}^{2} d_{k + 1} . \end{matrix}$

In order to present the main HADD algorithm, we need two additional auxiliary procedures. The first one is previously displayed Algorithm 1, by which we calculate the second vector direction, $d_{k}$ . The second procedure is the Backtracking line search algorithm for calculating the iterative step size value.

Algorithm 3 describes the main algorithm, termed the HADD algorithm.

Algorithm 2: The Backtracking line search procedure.

Require: Objective function $f (x)$ , the direction $d_{k}$ of the search at the point $x_{k}$ , and numbers $0 < σ < 0.5$ and $β \in (0,1)$ .

1: $t = 1$ .

2: While $f (x_{k} + t d_{k}) > f (x_{k}) + σ t g_{k}^{T} d_{k}$ , take $t ≔ t β$ .

3: Return $t_{k} = t$ .

Algorithm 3: HADD method.

Require: $0 < ρ < 1$ , $0 < τ < 1$ , $x_{0} \in R^{n}$ and $α \in (1,2)$ .

1: Set $k = 0 .$ For given $x_{0}$ , take $γ_{0} = 1$ and calculate $f (x_{0})$ and $g (x_{0})$ .

2: If $‖g_{k}‖ < ϵ$ , then return $x_{k}$ , $f (x_{k})$ and stop the algorithm, else continue by the next step.

3: Compute the iterative step length, $t_{k}$ , by Algorithm 2.

4: Compute second vector direction, $d_{k}$ , by the rule Second direction described in Algorithm 1.

5: Compute $x_{k + 1}$ using (20), then calculate $f (x_{k + 1})$ and $g_{k + 1} .$

6: Calculate the approximation parameter $γ_{k + 1}$ using (24).

7: If $γ_{k + 1} < 0$ take $γ_{k + 1} = 1$ .

8: Set $k ≔ k + 1$ and go to Step 2.

9: Return $x_{k + 1}$ and $f (x_{k + 1}) .$

4. Convergence of the HADD Method

The convergence properties of the established HADD iterative method are considered on the set of uniformly convex and strictly convex quadratic functions. In the case of uniformly convex functions the statements are the same as exposed in [1, 4]. For that reason, we just restate the following lemma, in which decreasing of the objective function in two successive points is estimated with respect to the HADD scheme. Thereupon, the upcoming theorem confirms linear convergence of our hybrid accelerated model.

Lemma 2.

Suppose the function $f$ is twice continuously differentiable and uniformly convex on $R^{n}$ . With that, let the sequence ${x_{k}}$ be generated by Algorithm 3. Then the next estimation is true $\begin{matrix} (26) & f (x_{k}) - f (x_{k + 1}) \geq μ {‖g_{k}‖}^{2}, \end{matrix}$ where $\begin{matrix} (27) & μ = \{\frac{σ}{M}, \frac{σ (1 - σ)}{L} β\} . \end{matrix}$

Theorem 3.

For the twice continuously differentiable and uniformly convex function $f$ on $R^{n}$ and the sequence ${x_{k}}$ generated by Algorithm 3, the following holds: $\begin{matrix} (28) & \underset{k \to \infty}{l i m} ‖g_{k}‖ = 0 . \end{matrix}$ Therewith, the sequence ${x_{k}}$ converges to the optimal solution at least linearly.

We show now that the iteration (20) is convergent regarding the set of strictly convex quadratic functions $\begin{matrix} (29) & f (x) = \frac{1}{2} x^{T} A x - b^{T} x . \end{matrix}$ In (29), it is assumed that $A$ is a real $n \times n$ symmetric positive definite matrix and that vector $b \in R^{n}$ is given. The smallest and the largest eigenvalues of the matrix $A$ , respectively, are denoted by $λ_{1}$ and $λ_{n}$ .

Lemma 4.

Let $f$ be the strictly convex quadratic function defined by (29), where $A \in R^{n \times n}$ is a symmetric positive definite matrix. Let $λ_{1}$ and $λ_{n}$ be the smallest and the largest eigenvalues of $A$ . Then, the following inequalities are valid for the hybrid accelerated gradient model (20): $\begin{matrix} (30) & λ_{1} \leq \frac{γ_{k + 1}}{t_{k + 1}} \leq \frac{4 λ_{n}}{σ}, k \in N, \end{matrix}$

Proof.

Let us calculate the difference in two successive iterative points of the goal function (29): $\begin{matrix} (31) & f (x_{k + 1}) - f (x_{k}) = \frac{1}{2} x_{k + 1}^{T} A x_{k + 1} - b^{T} x_{k + 1} - \frac{1}{2} x_{k}^{T} A x_{k} + b^{T} x_{k} . \end{matrix}$

Including the iteration (20) we continue computations: $\begin{matrix} (32) & f (x_{k + 1}) - f (x_{k}) = \frac{1}{2} {(x_{k} - α t_{k} γ_{k}^{- 1} g_{k} + α t_{k}^{2} d_{k})}^{T} A (x_{k} - α t_{k} γ_{k}^{- 1} g_{k} + α t_{k}^{2} d_{k}) - b^{T} (x_{k} - α t_{k} γ_{k}^{- 1} g_{k} + α t_{k}^{2} d_{k}) - \frac{1}{2} x_{k}^{T} A x_{k} + b^{T} x_{k} = \frac{1}{2} x_{k}^{T} A x_{k} - \frac{1}{2} α t_{k} γ_{k}^{- 1} g_{k}^{T} A x_{k} + \frac{1}{2} α t_{k}^{2} d_{k}^{T} A x_{k} - \frac{1}{2} α t_{k} γ_{k}^{- 1} g_{k}^{T} A x_{k} + \frac{1}{2} α^{2} t_{k}^{2} γ_{k}^{- 2} g_{k}^{T} A g_{k} - \frac{1}{2} α t_{k}^{3} γ_{k}^{- 1} d_{k}^{T} A g_{k} + \frac{1}{2} α t_{k}^{2} d_{k}^{T} A x_{k} - \frac{1}{2} α^{2} t_{k}^{3} γ_{k}^{- 1} d_{k}^{T} A g_{k} + \frac{1}{2} α^{2} t_{k}^{4} d_{k}^{T} A d_{k} - b^{T} x_{k} + α t_{k} γ_{k}^{- 1} b^{T} g_{k} - α t_{k}^{2} b^{T} d_{k} - \frac{1}{2} x_{k}^{T} A x_{k} + b^{T} x_{k} . \end{matrix}$ Applying the equality $g_{k} = A x_{k} - b$ and the symmetry property of $A$ , we get $\begin{matrix} (33) & f (x_{k + 1}) - f (x_{k}) = - α t_{k} γ_{k}^{- 1} g_{k}^{T} A x_{k} + α t_{k}^{2} d_{k}^{T} A x_{k} - α^{2} t_{k}^{3} γ_{k}^{- 1} d_{k}^{T} A g_{k} + \frac{1}{2} α^{2} t_{k}^{2} γ_{k}^{- 2} g_{k}^{T} A g_{k} + \frac{1}{2} α^{2} t_{k}^{4} d_{k}^{T} A d_{k} + α t_{k} γ_{k}^{- 1} b^{T} g_{k} - α t_{k}^{2} b^{T} d_{k} . \end{matrix}$

The right hand side of the previous expression can be further transformed as follows: $\begin{matrix} (34) & f (x_{k + 1}) - f (x_{k}) = α t_{k} γ_{k}^{- 1} (b^{T} g_{k} - x_{k}^{T} A g_{k}) + α t_{k}^{2} (x_{k}^{T} A d_{k} - b^{T} d_{k}) - α^{2} t_{k}^{3} γ_{k}^{- 1} d_{k}^{T} A g_{k} + \frac{1}{2} α t_{k}^{2} γ_{k}^{- 2} g_{k}^{T} A g_{k} + \frac{1}{2} α^{2} t_{k}^{4} d_{k}^{T} A d_{k} = α t_{k} g_{k}^{T} (t_{k} d_{k} - γ_{k}^{- 1} g_{k}) - \frac{1}{2} α^{2} t_{k}^{2} γ_{k}^{- 1} g_{k}^{T} A (t_{k} d_{k} - γ_{k}^{- 1} g_{k}) + \frac{1}{2} α^{2} t_{k}^{3} d_{k}^{T} A (t_{k} d_{k} - γ_{k}^{- 1} g_{k}) = (α t_{k} g_{k}^{T} - \frac{1}{2} α^{2} t_{k}^{2} γ_{k}^{- 1} g_{k}^{T} A + \frac{1}{2} α^{2} t_{k}^{3} d_{k}^{T} A) (t_{k} d_{k} - γ_{k}^{- 1} g_{k}) . \end{matrix}$ The replacement of $f (x_{k + 1}) - f (x_{k})$ by the right hand side of (34) into (24) leads us to $\begin{matrix} (35) & γ_{k + 1} = 2 \frac{(α t_{k} g_{k}^{T} - (1 / 2) α^{2} t_{k}^{2} γ_{k}^{- 1} g_{k}^{T} A + (1 / 2) α^{2} t_{k}^{3} d_{k}^{T} A) (t_{k} d_{k} - γ_{k}^{- 1} g_{k}) - α t_{k} g_{k}^{T} (t_{k} d_{k} - γ_{k}^{- 1} g_{k})}{α^{2} {(t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k})}^{T} (t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k})} . \end{matrix}$

After some calculations, we obtain $\begin{matrix} (36) & γ_{k + 1} = \frac{{(t_{k} d_{k} - γ_{k}^{- 1} g_{k})}^{T} A (t_{k} d_{k} - γ_{k}^{- 1} g_{k})}{{(t_{k} d_{k} - γ_{k}^{- 1} g_{k})}^{T} (t_{k} d_{k} - γ_{k}^{- 1} g_{k})} . \end{matrix}$ Previous expression confirms that $γ_{k + 1}$ is the Rayleigh quotient of the real symmetric matrix $A$ at the vector $t_{k} d_{k} - γ_{k}^{- 1} g_{k}$ , which leads us to the conclusion $\begin{matrix} (37) & λ_{1} \leq γ_{k + 1} \leq λ_{n}, k \in N . \end{matrix}$ The left hand side in inequalities (30) arrives from the fact $0 \leq t_{k + 1} \leq 1$ . To prove the right hand side of (30), we use the estimation [[3], eq. (3.8)]: $\begin{matrix} (38) & t_{k} > \frac{β (1 - σ) γ_{k}}{α L} . \end{matrix}$ Previous inequality implies $\begin{matrix} (39) & \frac{γ_{k + 1}}{t_{k + 1}} < \frac{L α}{β (1 - σ)} . \end{matrix}$ We can approximate the Lipschitz constant $L$ by the largest eigenvalue $λ_{n}$ and use the fact that $α \leq 2, 0 < σ < 0.5$ and $β \in (σ, 1)$ . Then (39) is restated to $\begin{matrix} (40) & \frac{γ_{k + 1}}{t_{k + 1}} < \frac{L α}{β (1 - σ)} < \frac{2 λ_{n}}{σ \cdot 0.5} = \frac{4 λ_{n}}{σ} . \end{matrix}$ Estimation of the Lipschitz constant $L$ by the largest eigenvalue $λ_{n}$ is certainly valid since the matrix $A$ is symmetric and $g (x) = A x - b .$ From these two facts we conclude that $\begin{matrix} (41) & ‖g (x) - g (y)‖ = ‖A x - A y‖ = ‖A (x - y)‖ \leq ‖A‖ ‖x - y‖ = λ_{n} ‖x - y‖, \end{matrix}$ which completes the proof.

Theorem 5.

Let the iterations (19) be applied on strictly convex quadratic function $f$ given by the expression (29). Suppose that the condition $\begin{matrix} (42) & λ_{n} < \frac{2 λ_{1}}{α} \end{matrix}$ holds for the largest and the smallest eigenvalues of symmetric positive definite matrix $A$ . Then, the following estimations are true: $\begin{matrix} (43) & {(p_{i}^{k + 1})}^{2} \leq δ^{2} {(p_{i}^{k})}^{2}, \\ {(q_{i}^{k + 1})}^{2} \leq λ_{n}^{2} {(q_{i}^{k})}^{2}, \end{matrix}$ where $\begin{matrix} (44) & δ = \max \{1 - \frac{σ λ_{1}}{2 λ_{n}}, \frac{λ_{n}}{λ_{1}} - 1\} \end{matrix}$ and $p_{i}^{k}, q_{i}^{k} \in R, k, i, n \in N .$ Therewith $\begin{matrix} (45) & \underset{k \to \infty}{l i m} ‖g_{k}‖ = 0 . \end{matrix}$

Proof.

Let us consider the orthonormal system of eigenvectors ${v_{1}, v_{2}, \dots, v_{n}}$ of the matrix $A$ . Thereon, we construct the sequence of values ${x_{k}}$ by applying Algorithm 3 on strictly convex quadratic function $f$ defined by (29). Then, for some $k \in N$ and for some constants $p_{1}^{k}, p_{2}^{k}, \dots, p_{n}^{k}, q_{1}^{k}, q_{2}^{k}, \dots, q_{n}^{k} \in R$ it follows that $\begin{matrix} (46) & g_{k} = A x_{k} - b = \sum_{i = 1}^{n} p_{i}^{k} v_{i}, \\ d_{k} = \sum_{i = 1}^{n} q_{i}^{k} v_{i} . \end{matrix}$

Applying (20), further we conclude that $\begin{matrix} (47) & g_{k + 1} = A (x_{k} - α t_{k} γ_{k}^{- 1} g_{k} + α t_{k}^{2} d_{k}) - b = (I - α t_{k} γ_{k}^{- 1} A) g_{k} + α t_{k}^{2} A d_{k} . \end{matrix}$ Having in mind the representation (46), one can verify $\begin{matrix} (48) & g_{k + 1} = \sum_{i = 1}^{n} p_{i}^{k + 1} v_{i} = \sum_{i = 1}^{n} ((1 - α t_{k} γ_{k}^{- 1} λ_{i}) p_{i}^{k} + α t_{k}^{2} λ_{i} q_{i}^{k}) v_{i} = \sum_{i = 1}^{n} (1 - α t_{k} γ_{k}^{- 1} λ_{i}) p_{i}^{k} v_{i} + α t_{k}^{2} \sum_{i = 1}^{n} λ_{i} q_{i}^{k} v_{i} . \end{matrix}$ To prove $\begin{matrix} (49) & {(p_{i}^{k + 1})}^{2} \leq δ^{2} {(p_{i}^{k})}^{2}, {(q_{i}^{k + 1})}^{2} \leq λ_{n}^{2} {(q_{i}^{k})}^{2}, \end{matrix}$ i.e., the inequalities (43), we only need to verify that $\begin{matrix} (50) & |1 - α \frac{λ_{i}}{γ_{k} t_{k}^{- 1}}| \leq δ \end{matrix}$ since $|λ_{i}| \leq λ_{n}$ for all $i \in {1,2, \dots, n}$ . There are two possibilities:

(1) $α λ_{i} \leq t_{k}^{- 1} γ_{k} .$

This case implies the next set of inequalities: $\begin{matrix} (51) & 1 > α \frac{λ_{i}}{γ_{k} t_{k}^{- 1}} = \frac{α λ_{i}}{γ_{k}} \cdot t_{k} \geq \frac{α λ_{1}}{γ_{k}} \frac{β (1 - σ) γ_{k}}{L α} = \frac{λ_{1} β (1 - σ)}{L} > \frac{λ_{1} σ \cdot 0.5}{λ_{n}} = \frac{λ_{1}}{2 λ_{n}} σ . \end{matrix}$

As a consequence, we can conclude that $\begin{matrix} (52) & 1 - α \frac{λ_{i}}{γ_{k} t_{k}^{- 1}} \leq 1 - \frac{λ_{1}}{2 λ_{n}} σ \leq δ . \end{matrix}$

(2) $α λ_{i} \geq t_{k}^{- 1} γ_{k} .$

In this case, one can verify the following estimations: $\begin{matrix} (53) & 1 < \frac{α λ_{i}}{γ_{k} t_{k}^{- 1}} \leq \frac{α λ_{n}}{λ_{1}} \Rightarrow |1 - α \frac{λ_{i}}{γ_{k} t_{k}^{- 1}}| \leq \frac{α λ_{n}}{λ_{1}} \leq δ . \end{matrix}$ The representation (46) and the fact that ${v_{1}, v_{2}, \dots, v_{n}}$ is an orthonormal system of eigenvectors lead to the next conclusion $\begin{matrix} (54) & {‖g_{k}‖}^{2} = \sum_{i = 1}^{n} {(d_{i}^{k})}^{2} . \end{matrix}$ Now, knowing that the parameter $δ$ under condition $λ_{n} < 2 λ_{1} / α$ satisfies $0 < δ < 1$ , we confirm that the final statement is true. $\begin{matrix} (55) & \underset{k \to \infty}{l i m} ‖g_{k}‖ = 0 \end{matrix}$

Remark 6.

The assumption (42) used in the previous theorem is required in order to prove that the HADD process is convergent for the strictly convex quadratics. Therewith, knowing that the hybrid parameter $α \in (1,2)$ implies $λ_{n} / λ_{1}$ $\in (1,2)$ points to the conclusion that Theorem 5 is applicable to very few cases. However, this is not entirely so since we choose only one particular value $α \in (1,2)$ for the practical computations. Regarding this matter, the authors in [3] numerically confirmed that the optimal value of the hybrid parameter $α$ is the one close to the left limit of the interval $(1,2)$ , i.e., the value which is very close to 1. Therefore, we choose $α = 1.1$ for numerical tests displayed in the next section. Choosing the similar values for hybrid parameter $α$ , the condition (42) becomes very close to the condition $λ_{n} < 2 λ_{1}$ , used in [12], under which Q-linear convergence rate of the preconditioned BB method was established.

5. Computational Tests and Comparisons

The performance of the C $+ +$ implementation of derived HADD model is investigated on a set of 630 test unconstrained optimization problems picked from [13]. We conduct the testings on a Workstation Intel Celeron $1.6$ GHz. The following stopping criteria are used: $\begin{matrix} (56) & ‖g_{k}‖ \leq 1 0^{- 6}, \\ \frac{|f (x_{k + 1}) - f (x_{k})|}{1 + |f (x_{k})|} \leq 1 0^{- 16} . \end{matrix}$ The values of the Backtracking parameters are set up as follows: $σ = 0.000 1$ and $β = 0.8$ .

We compare the hybrid accelerated HADD method with its forerunner ADD scheme, as well as with the hybrid accelerated HSM method. The number of function evaluations is the performance profile measured in all tests. The dominance of the ADD method regarding the number of iterations among the other comparative models was confirmed in [4]. However, from that research we do not have any information about the behavior of the ADD method when the number of function evaluations is involved. With respect to this parameter, the HSM scheme upgrades the accelerated SM method as well as Nesterov’s line search algorithm; see [3]. For these reasons, our experimental goal is to numerically prove better performance feature of the HADD method, considering the number of function evaluations, when compared with the ADD and the HSM method.

In Table 1, we display the number of problems, out of 630, for which an algorithm achieved the minimum number of function evaluations. In the same table, we also display the total number of problems for which all three algorithms achieved an equal number of function evaluations. Based on the results displayed in this table, it is obvious that the HADD scheme convincingly outperforms the other two comparative models.

Table 1

Comparison between the HADD, ADD, and HSM methods regarding the minimal number of function evaluations.

Comparative methods	HADD	ADD	HSM	=
Number of function evaluations	142	0	18	50

For more clear visualization of the performance of the HADD algorithm versus the ADD and the HSM algorithms, we display in Figure 1 the Dolan-Moré’s performance profile subject to the number of function evaluations metric. As we can see, the HADD scheme is more robust and therewith more efficient than the other two methods.

[figure omitted; refer to PDF]

Obtained numerical results confirm that applied hybridization process is a good way to improve some important characteristics of chosen accelerated methods. Preferable outcomes of the HADD scheme, regarding analyzed characteristic, come from the properly chosen hybrid value $α$ , together with derived accelerated parameter $γ^{H A D D}$ . Good convergent properties of defined HADD process can be a reason for applying proposed hybridization on some other gradient and accelerated gradient models.

6. Conclusion

We present a hybrid accelerated double direction gradient method for solving unconstrained optimization problems. The HADD method is derived by applying good properties of the hybrid representation introduced in [3] in conjunction with the form of double direction optimization model with accelerated parameter presented in [4]. The convergence of defined optimization model is provided on the set of uniformly convex and strictly convex quadratic functions.

The HADD scheme reserves preferable features of both forerunner methods. Therewith, according to conducted numerical experiments, it outperforms the ADD and HSM methods regarding the requested number of function evaluations. We evaluated the Dolan-Moré performance profiles of comparative methods and showed that the HADD iteration is the most efficient compared to the other two algorithms.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The first and the third authors acknowledge support from the internal research project IS01-17 supported by the Faculty of Sciences and Mathematics, University of Priština, Serbia. The second author gratefully acknowledges support from the Project supported by Ministry of Education and Science of Republic of Serbia, Grant No. 174013.

References

[1] P. S. Stanimirović, M. B. Miladinović, "Accelerated gradient descent methods with line search," Numerical Algorithms, vol. 54 no. 4, pp. 503-520, DOI: 10.1007/s11075-009-9350-8, 2010.

[2] M. J. Petrović, "An accelerated double step size model in unconstrained optimization," Applied Mathematics and Computation, vol. 250, pp. 309-319, DOI: 10.1016/j.amc.2014.10.104, 2015.

[3] M. J. Petrovic', V. Rakocevic', N. Kontrec, S. Panic', D. Ilic', "Hybridization of accelerated gradient descent method," Numer. Algor, 2018.

[4] M. J. Petrović, P. S. Stanimirović, "Accelerated Double Direction Method for Solving Unconstrained Optimization Problems," Mathematical Problems in Engineering, vol. 2014,DOI: 10.1155/2014/965104, 2014.

[5] P. S. Stanimirović, G. V. Milovanović, M. J. Petrović, "A Transformation of Accelerated Double Step Size Method for Unconstrained Optimization," Mathematical Problems in Engineering, vol. 2015,DOI: 10.1155/2015/283679, 2015.

[6] S. Ishikawa, "Fixed points by a new iteration method," Proceedings of the American Mathematical Society, vol. 44, pp. 147-150, DOI: 10.1090/S0002-9939-1974-0336469-5, 1974.

[7] W. R. Mann, "Mean value methods in iteration," Proceedings of the American Mathematical Society, vol. 4, pp. 506-510, DOI: 10.1090/S0002-9939-1953-0054846-3, 1953.

[8] E. Picard, "Memoire sur la theorie des equations aux derivees partielles et la methode des approximations successives," Journal de Mathématiques Pures et Appliquées, vol. 6, pp. 145-210, 1890.

[9] S. H. Khan, "A Picard-Mann hybrid iterative process," Fixed Point Theory and Applications, vol. 2013, article 69,DOI: 10.1186/1687-1812-2013-69, 2013.

[10] N. I. Djuranović-Miličić, M. Gardaševic-Filipović, "A multi-step curve search algorithm in nonlinear optimization: nondifferentiable convex case," Facta Universitatis, Series: Mathematics and Informatics, vol. 25, pp. 11-24, 2010.

[11] A. Kumar, D. K. Gupta, E. Martinez, S. Singh, "Directional k -step Newton methods in n variables and its semilocal convergence analysis," Mediterranean Journal of Mathematics, vol. 15 no. 2,DOI: 10.1007/s00009-018-1077-0, 2018.

[12] B. Molina, M. Raydan, "Preconditioned Barzilai-BORwein method for the numerical solution of partial differential equations," Numerical Algorithms, vol. 13 no. 1-2, pp. 45-60, DOI: 10.1007/bf02143126, 1996.

[13] N. Andrei, "An unconstrained optimization test functions collection," Advanced Modeling and Optimization, vol. 10 no. 1, pp. 147-161, 2008.

Word count: 6241

Show less

Copyright © 2018 Milena J. Petrović et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

We present a hybridization of the accelerated gradient method with two vector directions. This hybridization is based on the usage of a chosen three-term hybrid model. Derived hybrid accelerated double direction model keeps preferable properties of both included methods. Convergence analysis demonstrates at least linear convergence of the proposed iterative scheme on the set of uniformly convex and strictly convex quadratic functions. The results of numerical experiments confirm better performance profile in favor of derived hybrid accelerated double direction model when compared to its forerunners.

Details

Title

Hybrid Modification of Accelerated Double Direction Method

Author

Petrović, Milena J¹

; Stanimirović, Predrag S²

; Kontrec, Nataša¹; Mladenović, Julija³

¹ University of Priština, Faculty of Science, Lole Ribara 29, 28000 Kosovska Mitrovica, Serbia
² University of Niš, Faculty of Science and Mathematics, Višegradska 33, 18000 Niš, Serbia
³ Politehnika, School for New Technologies, Autoput 18, 11000 Belgrade, Serbia

Editor

J-C Cortés

Publication year

2018

Publication date

2018

Publisher

John Wiley & Sons, Inc.

ISSN

1024123X

e-ISSN

15635147

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2018/1523267

ProQuest document ID

2135027995

Hybrid Modification of Accelerated Double Direction Method

Jump to:

Full Text

Abstract

Details

Suggested sources