Convergence Analysis of Multiblock Inertial ADMM

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

The nonconvex global consensus problem with regularization [1] has the following form: $\begin{matrix} (1) & \min \sum_{i = 1}^{N} f_{i} x + g x \\ s . t . x \in X, \end{matrix}$ where $f_{i} : R^{n} ⟶ R \cup + \infty, i = 1, 2, \dots, N$ are smooth, possibly nonconvex functions, while $g : R^{n} ⟶ R$ is a convex nonsmooth regularization term and X is a closed convex set. This problem is related to the convex global consensus problem discussed heavily [2], but it is possible that $f_{i}^{'}$ s are nonconvex.

In many practical applications, $f_{i}^{'}$ s need to be handled by a single agent, such as a thread or a processor. Now, we transform problem (1) into the following equivalent linearly constrained problem under the help of new variables ${x_{i}}_{i = 0}^{N}$ : $\begin{matrix} (2) & \min \sum_{i = 1}^{N} f_{i} x_{i} + g x_{0} \\ s . t . x_{i} = x_{0} \forall i = 1,2, \dots, N, x_{0} \in X . \end{matrix}$

Note that the problem (2) owns $N$ blocks with different variables $x_{1}, \dots, x_{N}$ and one globe variable. Then, each distributed agent can handle a single local variable $x_{i}$ and a local function $f_{i}$ , respectively.

The augmented Lagrangian function with multipliers $y_{i} \in R^{n}, i = 1, 2, \dots, N$ of problem (2) is defined as follows: $\begin{matrix} (3) & L_{ρ} x_{i}, x_{0}, y = \sum_{i = 1}^{N} f_{i} x_{i} + g x_{0} + \sum_{i = 1}^{N} y_{i}, x_{i} - x_{0} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i} - x_{0}}^{2}, \end{matrix}$ where $ρ > 0$ is a penalty parameter and problem (2) can be solved distributively by the following classical ADMM procedure: $\begin{matrix} (4) & \begin{matrix} x_{0}^{k + 1} = \underset{x_{0} \in X}{argmin} L_{ρ} x_{i}^{k}, x_{0}, y^{k}, \\ x_{i}^{k + 1} = \underset{x_{i}}{argmin} f_{i} x_{i} + y_{i}^{k}, x_{i} - x_{0}^{k + 1} + \frac{ρ}{2} {x_{i} - x_{0}^{k + 1}}^{2}, \\ y_{i}^{k + 1} = y_{i}^{k} + ρ x_{i}^{k + 1} - x_{0}^{k + 1}, \\ i = 1,2, \dots, N . \end{matrix} \end{matrix}$

ADMM was initially introduced in the 1970s [3, 4], and its convergence properties for convex case have been extensively studied. However, ADMM or its directly extended version may not converge when there is a nonconvex function in the objective. Yang et al. [5] studied the convergence of the ADMM for the nonconvex optimization model which come from the background/foreground extraction. Hong et al. [6] analyzed the convergence of alternating direction method of multipliers for a family of nonconvex problems. Guo et al. [7] studied the convergence of ADMM for multiblock nonconvex separable optimization models.

Recently, some scholars studied the inertial type of ADMM for convex optimization. For example, Chen et al. [8] analyzed a class of inertial ADMM for linearly constrained separable convex optimization, and Moudafi and Elissabeth [9] extended the inertial technique to solve the maximal monotone operator inclusion problem. The research interests for the nonconvex cases are increasing in recent years; e.g., Chao et al. [10] proposed and analyzed an inertial proximal ADMM for a class of nonconvex optimization problems while all the above inertial ADMM algorithms were presented for solving only two-block optimization problem (not for multiple-block case). Whether the convergence of the inertial ADMM is assured when the involved number of blocks is more than two? It is an important problem to research.

The purpose of the present study is to examine the convergence of inertial ADMM with multiblocks for nonconvex consensus problem under the assumption that the potential function satisfies the Kurdyka–Lojasiewicz property. The preliminary numerical results show the effectiveness of the proposed algorithm.

The rest of this paper is organized as follows. In Section 2, some necessary preliminaries for further analysis are summarized. Section 3 proposes a multiblock nonconvex inertial ADMM algorithm and analyzes its convergence under some conditions. In Section 4, we prove the validity of the algorithm by the numerical experiment. Finally, some conclusions are drawn in Section 5.

2. Preliminaries

Let $R^{n}$ denote the n-dimensional Euclidean space, $R \cup + \infty$ denote the extended real number set, and N denote the natural number set. $.$ represents the Euclidean norm. Let $dom f : = x \in R^{n} : f x < + \infty$ denote the domain of function $f : R^{n} ⟶ R \cup \infty$ and $x, y = \sum_{i = 1}^{n} x_{i} y_{i}$ denote the inner product. For function if $f$ if $f \bar{x} \leq {l i m i n f}_{x ⟶ \bar{x}} f x,$ we say that $f$ is lower semicontinuous at $\bar{x}$ . If $f$ is lower semicontinuous at every point $x \in R^{n}$ , we say that $f$ is lower semicontinuous function.

For a set $S \subset R^{n}$ and a point $x \in R^{n}$ , let $d x, S = \inf_{y \in S} {y - x}^{2}$ . If $S = \emptyset$ , we set $d x, S = + \infty$ for all $x \in R^{n}$ .

The Lagrangian function of (2), with multiplier $y = {y_{1}, y_{2}, \dots, y_{N}}^{T}$ , is defined as $\begin{matrix} (5) & L x_{i}, x_{0}, y = \sum_{i = 1}^{N} f_{i} x_{i} + g x_{0} + \sum_{i = 1}^{N} y_{i}, x_{i} - x_{0} . \end{matrix}$

Definition 1.

If $w^{*} = {x_{i}^{*}, x_{0}^{*}, y^{*}}^{Τ}$ such that $\begin{matrix} (6) & \begin{matrix} \nabla_{x_{i}} f_{i} x_{i}^{*} = - y_{i}^{*}, \\ \sum_{i = 1}^{N} y_{i}^{*} \in \partial g x^{*}, \\ x_{i}^{*} - x_{0}^{*} = 0, \end{matrix} \end{matrix}$ then $w^{*}$ is called a critical point or stationary point of the Lagrange function $L x_{i}, x_{0}, y$ .

A very important technique to prove the convergence of the ADMM for nonconvex optimization problems relies on the assumption that the potential function satisfying the following Kurdyka–Lojasiewicz property (KL property) [11–14].

For notational simplicity, we use $Ψ_{ε_{2}} ε_{2} > 0$ to denote the set of concave functions $ϕ : 0, ε_{2} ⟶ 0, + \infty$ such that

(i) $ϕ 0 = 0$ , $ϕ$ is continuous differentiable on $0, ε_{2}$ and continuous at $0$

(ii) $ϕ^{'} s > 0, \forall s \in 0, ε_{2}$

Definition 2.

(see [14]) (KL property). Let $f : R^{n} ⟶ R^{n} \cup + \infty$ be a proper lower semicontinuous function. If there exists $ε_{2} \in 0, + \infty$ , a neighborhood $U$ of $x^{*}$ , and a function $ϕ \in Ψ_{ε_{2}}$ , such that for all $x \in U \cap f x^{*} < f < f x^{*} + ε_{2}$ , it holds that $\begin{matrix} (7) & ϕ' f x - f x^{*} d 0, \partial f x \geq 1, \end{matrix}$ then $f$ is said to have the KL property at $x^{*}$ .

3. Algorithm and Convergence Analysis

For convenience, we fix the following notations: $w^{k} = {x_{i}^{k}, y^{k}, x_{0}^{k}}^{Τ}$ , ${\hat{w}}^{k} = {x_{i}^{k}, y^{k}, x_{0}^{k}, x_{i}^{k - 1}, y^{k - 1}, x_{0}^{k - 1}}^{Τ}$ . Basis on (4), we propose the following algorithm for solving problem (2).

Algorithm 1.

Inertial ADMM (IADMM). Choose $x_{0}^{0} \in R^{n}, y_{i}^{0} \in R^{n}, x_{i}^{0} \in R^{n}, i = 1,2, \dots, N, τ_{k} > 0, ρ > 0$ and $θ_{k} \in 0,1$ , $\forall k \geq 1 .$ For the given point $w^{k} = {x_{i}^{k}, y^{k}, x_{0}^{k}}^{Τ}$ , consider the iterative scheme: $\begin{matrix} (8) & \begin{matrix} x_{0}^{k + 1} = \underset{x_{0} \in X}{argmin} g x_{0} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k} - x_{0} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}}^{2} + \frac{τ_{k}}{2} {x_{0} - z_{0}^{k}}^{2} a, \\ x_{i}^{k + 1} = \underset{x_{i}}{argmin} f_{i} x_{i} + y_{i}^{k}, x_{i} - x_{0}^{k + 1} + \frac{ρ}{2} {x_{i} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} {x_{i} - z_{i}^{k}}^{2} b, \\ y_{i}^{k + 1} = y_{i}^{k} + ρ x_{i}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{i}^{k + 1} - z_{i}^{k} c, \end{matrix} \end{matrix}$ where $\begin{matrix} (9) & \begin{matrix} z_{0}^{k} = x_{0}^{k} + θ_{k} x_{0}^{k} - x_{0}^{k - 1}, \\ z_{i}^{k} = x_{i}^{k} + θ_{k} x_{i}^{k} - x_{i}^{k - 1}, \end{matrix} \end{matrix}$ associated with $i = 1,2, \dots, N$ .

From the optimality conditions of (8) (a) and (8) (b), we have $\begin{matrix} (10) & 0 \in \partial g x_{0}^{k + 1} - \sum_{i = 1}^{N} y_{i}^{k} - ρ \sum_{i = 1}^{N} x_{i}^{k} - x_{0}^{k + 1} + τ_{k} x_{0}^{k + 1} - z_{0}^{k}, \\ (11) & 0 = \nabla_{x_{i}} f_{i} x_{i}^{k + 1} + y_{i}^{k} + ρ x_{i}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{i}^{k + 1} - z_{i}^{k}, \\ i = 1,2, \dots, N . \end{matrix}$

Remark 1.

Compared with the inertial ADMM in [10], each subproblem in our algorithm has the inertial term, and we handle multiblock case here.

Subsequently, we will discuss the convergence of Algorithm 1 under the following assumptions.

Assumption 1.

(i) $g x$ is proper lower semicontinuous, and $\nabla_{x_{i}} f_{i} x_{i}$ is $l_{f_{i}}$ Lipschitz continuous; i.e., $\begin{matrix} (12) & \nabla_{x_{i}} f_{i} x_{i}^{k + 1} - \nabla_{x_{i}} f_{i} x_{i}^{k} \leq l_{f_{i}} x_{i}^{k + 1} - x_{i}^{k} . \end{matrix}$

(ii) $ρ$ is large enough, such that $0 \leq θ_{k} < ρ - l_{f}^{2} / 2 ρ + 2, τ_{k} > 2 l_{f}^{2} / ρ - 2 ρ θ_{k} - l_{f}^{2} - 2 θ_{k}$ .

Lemma 1.

For each $k \in Ν$ , define $l_{f} = \max {l_{f_{i}}}_{i = 1,2, \dots, N}$ , we have $\begin{matrix} (13) & {y_{i}^{k + 1} - y_{i}^{k}}^{2} \leq l_{f_{i}}^{2} {x_{i}^{k + 1} - x_{i}^{k}}^{2} \leq l_{f}^{2} {x_{i}^{k + 1} - x_{i}^{k}}^{2} . \end{matrix}$

Proof.

Since $y_{i}^{k + 1} = y_{i}^{k} + ρ x_{i}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{i}^{k + 1} - z_{i}^{k},$ from (11), one has $\begin{matrix} (14) & y_{i}^{k + 1} = - \nabla_{x_{i}} f_{i} x_{i}^{k + 1} . \end{matrix}$

Thus, $\begin{matrix} (15) & {y_{i}^{k + 1} - y_{i}^{k}}^{2} = {\nabla_{x_{i}} f_{i} x_{i}^{k + 1} - \nabla_{x_{i}} f_{i} x_{i}^{k}}^{2} \\ \leq l_{f_{i}}^{2} {x_{i}^{k + 1} - x_{i}^{k}}^{2} \\ \leq l_{f}^{2} {x_{i}^{k + 1} - x_{i}^{k}}^{2} . \end{matrix}$

Hence, the result is obtained.

Lemma 2.

Select $ρ$ large enough, suppose that Assumption 1 holds. Then, for each $k \in Ν$ , $\begin{matrix} (16) & L_{ρ} w^{k + 1} + \sum_{i = 1}^{N} γ_{1} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + γ_{1} {x_{0}^{k + 1} - x_{0}^{k}}^{2} \\ \leq L_{ρ} w^{k} + \sum_{i = 1}^{N} γ_{2} {x_{i}^{k} - x_{i}^{k - 1}}^{2} + γ_{2} {x_{0}^{k} - x_{0}^{k - 1}}^{2}, \end{matrix}$ where $γ_{1} = τ_{k} / 2 1 - θ_{k} - 1 / ρ + τ_{k} / 2 ρ l_{f}^{2} - τ_{k} / 2 ρ and γ_{2} = τ_{k} θ_{k} / 2 + τ_{k} θ_{k} / ρ$ .

Proof.

By the definition of the augmented Lagrangian function, (8) (c) and (15), we have $\begin{matrix} (17) & L_{ρ} x_{i}^{k + 1}, x_{0}^{k + 1}, y^{k + 1} - L_{ρ} x_{i}^{k + 1}, x_{0}^{k + 1}, y^{k}, \\ = \sum_{i = 1}^{N} y_{i}^{k + 1} - y_{i}^{k}, x_{i}^{k + 1} - x_{0}^{k + 1}, \\ = \frac{1}{ρ} \sum_{i = 1}^{N} y_{i}^{k + 1} - y_{i}^{k}, y_{i}^{k + 1} - y_{i}^{k} - τ_{k} x_{i}^{k + 1} - z_{i}^{k} \\ \leq \frac{1}{ρ} + \frac{τ_{k}}{2 ρ} \sum_{i = 1}^{N} {y_{i}^{k + 1} - y_{i}^{k}}^{2} + \frac{τ_{k}}{2 ρ} \sum_{i = 1}^{N} {x_{i}^{k + 1} - z_{i}^{k}}^{2} \\ \leq \frac{1}{ρ} + \frac{τ_{k}}{2 ρ} l_{f}^{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + \frac{τ_{k}}{2 ρ} \sum_{i = 1}^{N} {x_{i}^{k + 1} - z_{i}^{k}}^{2} . \end{matrix}$

From (8) (a) and (8) (b), we obtain $\begin{matrix} (18) & g x_{0}^{k + 1} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k} - x_{0}^{k + 1} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} {x_{0}^{k + 1} - z_{0}^{k}}^{2} \\ \leq g x_{0}^{k} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k} - x_{0}^{k} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k}}^{2} + \frac{τ_{k}}{2} {x_{0}^{k} - z_{0}^{k}}^{2}, and \\ \sum_{i = 1}^{N} f_{i} x_{i}^{k + 1} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k + 1} - x_{0}^{k + 1} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - z_{i}^{k}}^{2} \\ \leq \sum_{i = 1}^{N} f_{i} x_{i}^{k} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k} - x_{0}^{k + 1} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k} - z_{i}^{k}}^{2}, \end{matrix}$ respectively. Then, it is easy to get $\begin{matrix} (19) & g x_{0}^{k + 1} - g x_{0}^{k} + \sum_{i = 1}^{N} y_{i}^{k}, x_{0}^{k} - x_{0}^{k + 1} - \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k}}^{2} \\ \leq \frac{τ_{k}}{2} {x_{0}^{k} - z_{0}^{k}}^{2} - \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k + 1}}^{2} - \frac{τ_{k}}{2} {x_{0}^{k + 1} - z_{0}^{k}}^{2}, and \\ \sum_{i = 1}^{N} f_{i} x_{i}^{k + 1} - \sum_{i = 1}^{N} f_{i} x_{i}^{k} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k + 1} - x_{i}^{k} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{0}^{k + 1}}^{2} \\ \leq \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k} - z_{i}^{k}}^{2} - \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - z_{i}^{k}}^{2} . \end{matrix}$

Therefore, we have $\begin{matrix} (20) & L_{ρ} x_{i}^{k + 1}, x_{0}^{k + 1}, y^{k} - L_{ρ} x_{i}^{k}, x_{0}^{k}, y^{k} \\ = L_{ρ} x_{i}^{k + 1}, x_{0}^{k + 1}, y^{k} - L_{ρ} x_{i}^{k}, x_{0}^{k + 1}, y^{k} + L_{ρ} x_{i}^{k}, x_{0}^{k + 1}, y^{k} - L_{ρ} x_{i}^{k}, x_{0}^{k}, y^{k} \\ = \sum_{i = 1}^{N} f_{i} x_{i}^{k + 1} - \sum_{i = 1}^{N} f_{i} x_{i}^{k} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k + 1} - x_{i}^{k} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{0}^{k + 1}}^{2} \\ + g x_{0}^{k + 1} - g x_{0}^{k} - \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k}}^{2} + \sum_{i = 1}^{N} y_{i}^{k}, x_{0}^{k} - x_{0}^{k + 1} \\ \leq \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k} - z_{i}^{k}}^{2} - \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - z_{i}^{k}}^{2} \\ + g x_{0}^{k + 1} - g x_{0}^{k} - \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}^{k}}^{2} + \sum_{i = 1}^{N} y_{i}^{k}, x_{0}^{k} - x_{0}^{k + 1} \\ \leq \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k} - z_{i}^{k}}^{2} - \frac{τ_{k}}{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - z_{i}^{k}}^{2} + \frac{τ_{k}}{2} {x_{0}^{k} - z_{0}^{k}}^{2} - \frac{τ_{k}}{2} {x_{0}^{k + 1} - z_{0}^{k}}^{2} \\ \leq - \frac{τ_{k}}{2} 1 - θ_{k} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + \frac{τ_{k}}{2} θ_{k} \sum_{i = 1}^{N} {x_{i}^{k} - x_{i}^{k - 1}}^{2} - \frac{τ_{k}}{2} 1 - θ_{k} {x_{0}^{k + 1} - x_{0}^{k}}^{2} \\ + \frac{τ_{k}}{2} θ_{k} {x_{0}^{k} - x_{0}^{k - 1}}^{2} . \end{matrix}$

Adding up (17) and (20), by the Assumption 1 (ii), we have $\begin{matrix} (21) & L_{ρ} w^{k + 1} \leq L_{ρ} w^{k} - \frac{τ_{k}}{2} 1 - θ_{k} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + \frac{τ_{k}}{2} θ_{k} \sum_{i = 1}^{N} {x_{i}^{k} - x_{i}^{k - 1}}^{2} - \frac{τ_{k}}{2} 1 - θ_{k} {x_{0}^{k + 1} - x_{0}^{k}}^{2} \\ + \frac{τ_{k}}{2} θ_{k} {x_{0}^{k} - x_{0}^{k - 1}}^{2} + \frac{1}{ρ} + \frac{τ_{k}}{2 ρ} \sum_{i = 1}^{N} {y_{i}^{k + 1} - y_{i}^{k}}^{2} + \frac{τ_{k}}{2 ρ} \sum_{i = 1}^{N} {x_{i}^{k + 1} - z_{i}^{k}}^{2} \\ \leq L_{ρ} w^{k} - \frac{τ_{k}}{2} 1 - θ_{k} - \frac{1}{ρ} + \frac{τ_{k}}{2 ρ} l_{f}^{2} - \frac{τ_{k}}{ρ} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + \frac{τ_{k} θ_{k}}{2} + \frac{τ_{k} θ_{k}}{ρ} \sum_{i = 1}^{N} {x_{i}^{k} - x_{i}^{k - 1}}^{2} \\ - \frac{τ_{k}}{2} 1 - θ_{k} {x_{0}^{k + 1} - x_{0}^{k}}^{2} + \frac{τ_{k}}{2} θ_{k} {x_{0}^{k} - x_{0}^{k - 1}}^{2}, \end{matrix}$ which implies that $\begin{matrix} (22) & L_{ρ} w^{k + 1} + \sum_{i = 1}^{N} γ_{1} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + γ_{1} {x_{0}^{k + 1} - x_{0}^{k}}^{2} \\ \leq L_{ρ} w^{k} + \sum_{i = 1}^{N} γ_{2} {x_{i}^{k} - x_{i}^{k - 1}}^{2} + γ_{2} {x_{0}^{k} - x_{0}^{k - 1}}^{2} . \end{matrix}$

Then, the results are obtained.

Remark 2.

From Assumption 1 (ii), we know that $γ_{1} > γ_{2}$ . Define the following potential regularized augmented Lagrangian function: $\begin{matrix} (23) & {\hat{L}}_{ρ} x_{i}, x_{0}, y, {\hat{x}}_{i}, {\hat{x}}_{0} = L_{ρ} x_{i}, x_{0}, y + \sum_{i = 1}^{N} γ_{2} {x_{i} - {\hat{x}}_{i}}^{2} + γ_{2} {x_{0} - {\hat{x}}_{0}}^{2}, \end{matrix}$ where $\hat{w} = x_{i}, x_{0}, y, {\hat{x}}_{i}, {\hat{x}}_{0} .$

If we take $η = γ_{1} - γ_{2} > 0, {\hat{w}}^{k} = x_{i}^{k}, x_{0}^{k}, y^{k}, x_{i}^{k - 1}, x_{0}^{k - 1},$ then $\begin{matrix} (24) & {\hat{L}}_{ρ} {\hat{w}}^{k} = L_{ρ} w^{k} + \sum_{i = 1}^{N} γ_{2} {x_{i}^{k} - x_{i}^{k - 1}}^{2} + γ_{2} {x_{0}^{k} - x_{0}^{k - 1}}^{2} . \end{matrix}$

From Lemma 2, we have $\begin{matrix} (25) & {\hat{L}}_{ρ} {\hat{w}}^{k + 1} + \sum_{i = 1}^{N} η {x_{i}^{k + 1} - x_{i}^{k}}^{2} + η {x_{0}^{k + 1} - x_{0}^{k}}^{2} \leq {\hat{L}}_{ρ} {\hat{w}}^{k}, \end{matrix}$ which implies that the whole sequence ${\hat{L}}_{ρ} {\hat{w}}^{k}_{k \geq 1}$ is monotonically nonincreasing. It is importance for our convergence analysis.

Lemma 3.

If the sequence $w^{k} = {x_{i}^{k}, x_{0}^{k}, y^{k}}^{T}$ is bounded, then $\sum_{k = 0}^{+ \infty} {w^{k + 1} - w^{k}}^{2} < + \infty$ .

Proof.

Since the sequence $w^{k}$ is bounded, there exists a subsequence ${\hat{w}}^{k_{j}}$ such that $\lim {}_{j ⟶ + \infty}{\hat{w}}^{k_{j}}^{j ⟶ + \infty} = {\hat{w}}^{*}$ . Since $g x$ is lower semicontinuous, $f_{i} : R^{n} ⟶ R \cup + \infty$ is Lipschitz differentiable, and the function ${\hat{L}}_{ρ} \cdot$ is lower semicontinuous, which leads to ${l i m i n f}_{j ⟶ + \infty} {\hat{L}}_{ρ} {\hat{w}}^{k_{j}} \geq {\hat{L}}_{ρ} {\hat{w}}^{*}$ ; thus, ${\hat{L}}_{ρ} {\hat{w}}^{k_{j}}$ is bounded from below. From Lemma 2, we know that ${\hat{L}}_{ρ} {\hat{w}}^{k}$ is nonincreasing; thus, ${\hat{L}}_{ρ} {\hat{w}}^{k_{j}}$ is convergent and ${\hat{L}}_{ρ} {\hat{w}}^{k} \geq {\hat{L}}_{ρ} {\hat{w}}^{*}$ for each k.

From Lemma 2, it yields $\begin{matrix} (26) & η \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + {x_{0}^{k + 1} - x_{0}^{k}}^{2} \leq {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{k + 1} . \end{matrix}$

Hence, $\begin{matrix} (27) & \sum_{k = 1}^{t} η \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + {x_{0}^{k + 1} - x_{0}^{k}}^{2} \leq {\hat{L}}_{ρ} {\hat{w}}^{1} - {\hat{L}}_{ρ} {\hat{w}}^{t + 1} \leq {\hat{L}}_{ρ} {\hat{w}}^{1} - {\hat{L}}_{ρ} {\hat{w}}^{*} . \end{matrix}$

Consequently, $\sum_{k = 0}^{+ \infty} {w^{k + 1} - w^{k}}^{2} < + \infty$ .

Lemma 4.

There exists $δ > 0$ such that $d 0, \partial {\hat{L}}_{ρ} {\hat{w}}^{k + 1} \leq δ β_{k}$ for each $k \in Ν$ , where $\begin{matrix} (28) & β_{k} = \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + x_{0}^{k + 1} - x_{0}^{k} + \sum_{i = 1}^{N} x_{i}^{k} - x_{i}^{k - 1} + x_{0}^{k} - x_{0}^{k - 1} . \end{matrix}$

Proof.

From the definition of ${\hat{L}}_{ρ} \hat{w}$ , we have $\begin{matrix} (29) & \begin{matrix} \partial_{x_{i}} {\hat{L}}_{ρ} {\hat{w}}^{k + 1} = \nabla_{x_{i}} f_{i} x_{i}^{k + 1} + y_{i}^{k + 1} + ρ x_{i}^{k + 1} - x_{0}^{k + 1} + 2 γ_{2} x_{i}^{k + 1} - x_{i}^{k}, \\ \partial_{x_{0}} {\hat{L}}_{ρ} {\hat{w}}^{k + 1} = \partial g x_{0}^{k + 1} - \sum_{i = 1}^{N} y_{i}^{k + 1} + ρ x_{i}^{k + 1} - x_{0}^{k + 1} + 2 γ_{2} x_{0}^{k + 1} - x_{0}^{k}, \\ \partial_{y} {\hat{L}}_{ρ} {\hat{w}}^{k + 1} = \frac{1}{ρ} \sum_{i = 1}^{N} y_{i}^{k + 1} - y_{i}^{k} - \frac{τ_{k}}{ρ} \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + \frac{τ_{k}}{ρ} θ_{k} \sum_{i = 1}^{N} x_{i}^{k} - x_{i}^{k - 1}, \\ \partial_{{\hat{x}}_{i}} {\hat{L}}_{ρ} {\hat{w}}^{k + 1} = - 2 γ_{2} \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k}, \\ \partial_{{\hat{x}}_{0}} {\hat{L}}_{ρ} {\hat{w}}^{k + 1} = - 2 γ_{2} x_{0}^{k + 1} - x_{0}^{k} . \end{matrix} \end{matrix}$

From Lemma 1 and the optimality conditions, we get $\begin{matrix} (30) & \begin{matrix} 0 = \nabla_{x_{i}} f_{i} x_{i}^{k + 1} + y_{i}^{k} + ρ x_{i}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{i}^{k + 1} - z_{i}^{k}, \\ 0 \in \partial g x_{0}^{k + 1} - \sum_{i = 1}^{N} y_{i}^{k} - ρ \sum_{i = 1}^{N} x_{i}^{k} - x_{0}^{k + 1} + τ_{k} x_{0}^{k + 1} - z_{0}^{k}, \\ y_{i}^{k + 1} = y_{i}^{k} + ρ x_{i}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{i}^{k + 1} - z_{i}^{k} . \end{matrix} \end{matrix}$

From (29) and (30), we obtain $\begin{matrix} (31) & {α_{1}^{k}, α_{2}^{k}, α_{3}^{k}, α_{4}^{k}, α_{5}^{k}}^{Τ} \in \partial {\hat{L}}_{ρ} {\hat{w}}^{k + 1}, \end{matrix}$ where $\begin{matrix} (32) & \begin{matrix} α_{1}^{k} = y_{i}^{k + 1} - y_{i}^{k} + 2 γ_{2} - τ_{k} x_{i}^{k + 1} - x_{i}^{k} + τ_{k} θ_{k} x_{i}^{k} - x_{i}^{k - 1}, \\ α_{2}^{k} = - \sum_{i = 1}^{N} y_{i}^{k + 1} - y_{i}^{k} - ρ \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + 2 γ_{2} - τ_{k} x_{0}^{k + 1} - x_{0}^{k} + τ_{k} θ_{k} x_{0}^{k} - x_{0}^{k - 1}, \\ α_{3}^{k} = \frac{1}{ρ} \sum_{i = 1}^{N} y_{i}^{k + 1} - y_{i}^{k} - \frac{τ_{k}}{ρ} \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + \frac{τ_{k}}{ρ} θ_{k} \sum_{i = 1}^{N} x_{i}^{k} - x_{i}^{k - 1}, \\ α_{4}^{k} = - 2 γ_{2} \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k}, \\ α_{5}^{k} = - 2 γ_{2} x_{0}^{k + 1} - x_{0}^{k} . \end{matrix} \end{matrix}$

Thus, $\begin{matrix} (33) & d 0, \partial {\hat{L}}_{ρ} {\hat{w}}^{k + 1} \leq {α_{1}^{k}, α_{2}^{k}, α_{3}^{k}, α_{4}^{k}, α_{5}^{k}}^{Τ} . \end{matrix}$

It follows from Assumption 1 and Lemma 1 that there exists $δ > 0$ such that $d 0, \partial {\hat{L}}_{ρ} {\hat{w}}^{k + 1} \leq δ β_{k}$ , for each $k \in Ν$ .

Lemma 5.

Let $Γ {\hat{w}}^{k}$ denote the cluster point set of ${\hat{w}}^{k}$ . Then, $Γ {\hat{w}}^{k}$ is a nonempty compact set, and $\lim_{k ⟶ + \infty} d {\hat{w}}^{k}, Γ {\hat{w}}^{k} = 0$ .

And if ${\hat{w}}^{*} = {x_{i}^{*}, x_{0}^{*}, y^{*}, {\hat{x}}_{i}^{*}, {\hat{x}}_{0}^{*}}^{Τ} \in Γ {\hat{w}}^{k}$ , then $w^{*} = {x_{i}^{*}, x_{0}^{*}, y^{*}}^{Τ}$ is a critical point of the Lagrangian function $L$ of the problem (2). Moreover, ${\hat{L}}_{ρ} \cdot$ is finite and constant on $Γ {\hat{w}}^{k}$ and $\inf_{k \in Ν} {\hat{L}}_{ρ} {\hat{w}}^{k} = \lim_{k ⟶ + \infty} {\hat{L}}_{ρ} {\hat{w}}^{k}$ .

Proof.

In view of the definition of $Γ {\hat{w}}^{k}$ , it is true that $Γ {\hat{w}}^{k}$ is nonempty and compact, and $\lim_{k ⟶ + \infty} d {\hat{w}}^{k}, Γ {\hat{w}}^{k} = 0$ .

Let ${\hat{w}}^{*} = {x_{i}^{*}, x_{0}^{*}, y^{*}, {\hat{x}}_{i}^{*}, {\hat{x}}_{0}^{*}}^{Τ} \in Γ {\hat{w}}^{k}$ . Then, there exists a subsequence ${\hat{w}}^{k_{j} + 1}$ of ${\hat{w}}^{k}$ converging to ${\hat{w}}^{*}$ . Since $w^{k + 1} - w^{k} ⟶ 0 k ⟶ + \infty$ , we have $\lim_{j ⟶ + \infty} {\hat{w}}^{k_{j} + 1} = {\hat{w}}^{*}$ .

Since $y_{i}^{k_{j} + 1} = y_{i}^{k_{j}} + ρ x_{i}^{k_{j} + 1} - x_{0}^{k_{j} + 1} + τ_{k} x_{i}^{k_{j} + 1} - z_{i}^{k_{j}}$ , we have $x_{i}^{*} - x_{0}^{*} = 0$ .

Let $m_{k} x_{0} = L_{ρ} x_{i}^{k}, x_{0}, y^{k} + τ_{k} / 2 {x_{0} - z_{0}^{k}}^{2}$ . From Lemma 2, we have $\begin{matrix} (34) & g x_{0}^{k_{j} + 1} + \sum_{i = 1}^{N} y_{i}^{k_{j}}, x_{i}^{k_{j}} - x_{0}^{k_{j} + 1} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k_{j}} - x_{0}^{k_{j} + 1}}^{2} + \frac{τ_{k}}{2} {x_{0}^{k_{j} + 1} - z_{0}^{k_{j}}}^{2} \\ \leq g x_{0}^{*} + \sum_{i = 1}^{N} y_{i}^{k_{j}}, x_{i}^{k_{j}} - x_{0}^{*} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k_{j}} - x_{0}^{*}}^{2} + \frac{τ_{k}}{2} {x_{0}^{*} - z_{0}^{k_{j}}}^{2} . \end{matrix}$

That is, $m_{k_{j}} x_{0}^{k_{j} + 1} \leq m_{k_{j}} x_{0}^{*}$ .

Thus, $\begin{matrix} (35) & \underset{j ⟶ + \infty}{limsup} m_{k_{j}} x_{0}^{k_{j} + 1} = \underset{j ⟶ + \infty}{limsup} g x_{0}^{k_{j} + 1} \leq \underset{j ⟶ + \infty}{limsup} m_{k_{j}} x_{0}^{*} \\ = g x_{0}^{*} . \end{matrix}$

Since $g x$ is proper lower semicontinuous, we obtain $\begin{matrix} (36) & \underset{j ⟶ + \infty}{l i m i n f} g x_{0}^{k_{j} + 1} \geq g x_{0}^{*} . \end{matrix}$

From above, we get $\begin{matrix} (37) & \lim_{j ⟶ + \infty} g x_{0}^{k_{j} + 1} = g x_{0}^{*} . \end{matrix}$

Together with the continuity of $f_{i} i = 1,2, \dots, N$ and the closeness of $\partial g$ , we obtain $\begin{matrix} (38) & \begin{matrix} \nabla_{x_{i}} f_{i} x_{i}^{*} = - y_{i}^{*}, \\ \sum_{i = 1}^{N} y_{i}^{*} \in \partial g x^{*}, \\ x_{i}^{*} - x_{0}^{*} = 0 . \end{matrix} \end{matrix}$

Thus, ${\hat{w}}^{*}$ is a critical point of the Lagrange function L of the problem (2).

From (37) and Lemma 5, we have $\begin{matrix} (39) & \lim_{j ⟶ + \infty} {\hat{L}}_{ρ} {\hat{w}}^{k_{j} + 1} = {\hat{L}}_{ρ} {\hat{w}}^{*} \\ = L w^{*} . \end{matrix}$

Therefore, from (39) and the descent of ${\hat{L}}_{ρ} {\hat{w}}^{k}_{k \in N}$ , we obtain $\begin{matrix} (40) & \lim_{k ⟶ + \infty} {\hat{L}}_{ρ} {\hat{w}}^{k} = {\hat{L}}_{ρ} {\hat{w}}^{*} . \end{matrix}$

Thus, ${\hat{L}}_{ρ} \cdot$ is constant on $Γ {\hat{w}}^{k}$ . Moreover, $\inf_{k \in N} {\hat{L}}_{ρ} {\hat{w}}^{k} = \lim_{k ⟶ + \infty} {\hat{L}}_{ρ} {\hat{w}}^{k}$ .

Theorem 1.

Let ${\hat{L}}_{ρ} {\hat{w}}^{k}$ be the KL property at each point of $Γ {\hat{w}}^{k}$ . Then, the bounded sequences $w^{k}$ converges to a critical point of $L \cdot$ . Moreover, $\begin{matrix} (41) & \sum_{k = 0}^{+ \infty} w^{k + 1} - w^{k} < + \infty . \end{matrix}$

Proof.

By Lemma 5, we have $\lim_{k ⟶ + \infty} {\hat{L}}_{ρ} {\hat{w}}^{k} = {\hat{L}}_{ρ} {\hat{w}}^{*}$ , for all $w^{*} \in Γ {\hat{w}}^{k}$ . We consider the following two cases:

(i) If there exist an integer $k_{0}$ such that ${\hat{L}}_{ρ} {\hat{w}}^{k_{0}} = {\hat{L}}_{ρ} {\hat{w}}^{*}$ . From Lemma 2, for all $k > k_{0}$ , we have $\begin{matrix} (42) & η \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + {x_{0}^{k + 1} - x_{0}^{k}}^{2} \\ \leq {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{k + 1} \\ \leq {\hat{L}}_{ρ} {\hat{w}}^{k_{0}} - {\hat{L}}_{ρ} {\hat{w}}^{*} . \end{matrix}$

Thus, for any $k > k_{0}$ , we have $x_{i}^{k + 1} = x_{i}^{k}, i = 1,2, \dots, N, x_{0}^{k + 1} = x_{0}^{k}$ ; therefore, for any $k > k_{0} + 1$ , it follows that ${\hat{w}}^{k + 1} = {\hat{w}}^{k}$ and the assertion holds.

(ii) Assume that ${\hat{L}}_{ρ} {\hat{w}}^{k} > {\hat{L}}_{ρ} {\hat{w}}^{*}$ for all $k \in N$ . Since $\lim_{k ⟶ + \infty} d {\hat{w}}^{k}, Γ {\hat{w}}^{k} = 0$ , it follows that for any give $ε_{1} > 0$ , there exists $k_{1} > 0$ , such that $d {\hat{w}}^{k}, Γ {\hat{w}}^{k} < ε_{1}$ . Again since $\lim_{k ⟶ + \infty} {\hat{L}}_{ρ} {\hat{w}}^{k} = {\hat{L}}_{ρ} {\hat{w}}^{*}$ , for give $ε_{2} > 0$ , there exists $k_{2} > 0$ , such that ${\hat{L}}_{ρ} {\hat{w}}^{k} < {\hat{L}}_{ρ} {\hat{w}}^{*} + ε_{2}$ , for all $k > k_{2}$ .

Thus, when $k > \hat{k} = \max k_{1}, k_{2}$ , we have $\begin{matrix} (43) & d {\hat{w}}^{k}, Γ {\hat{w}}^{k} < ε_{1}, {\hat{L}}_{ρ} {\hat{w}}^{*} < {\hat{L}}_{ρ} {\hat{w}}^{k} < {\hat{L}}_{ρ} {\hat{w}}^{*} + ε_{2} . \end{matrix}$

In view of $Γ {\hat{w}}^{k}$ is nonempty compact set, ${\hat{L}}_{ρ} \cdot$ is constant on $Γ {\hat{w}}^{k}$ . By Definition 2, we have $φ^{'} C d 0, \partial {\hat{L}}_{ρ} {\hat{w}}^{k} \geq 1$ , for all $k > \hat{k}$ . $\begin{matrix} (44) & \frac{1}{φ' C} \leq d 0, \partial {\hat{L}}_{ρ} {\hat{w}}^{k} . \end{matrix}$

From the concavity of $ϕ$ , we have $\begin{matrix} (45) & φ {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{*} - φ {\hat{L}}_{ρ} {\hat{w}}^{k + 1} - {\hat{L}}_{ρ} {\hat{w}}^{*} \\ \geq φ' {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{*} {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{k + 1} . \end{matrix}$

Since $φ' {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{*} > 0$ and Lemma 2, we obtain $\begin{matrix} (46) & γ_{1} - γ_{2} \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + {x_{0}^{k + 1} - x_{0}^{k}}^{2} \\ \leq {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{k + 1} \\ \leq \frac{φ {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{*} - φ {\hat{L}}_{ρ} {\hat{w}}^{k + 1} - {\hat{L}}_{ρ} {\hat{w}}^{*}}{φ^{'} {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{*}} \\ \leq δ β_{k} φ {\hat{L}}_{ρ} {\hat{w}}^{k} - {\hat{L}}_{ρ} {\hat{w}}^{*} - φ {\hat{L}}_{ρ} {\hat{w}}^{k + 1} - {\hat{L}}_{ρ} {\hat{w}}^{*} . \end{matrix}$

Let $Φ_{a, b} = φ {\hat{L}}_{ρ} {\hat{w}}^{a} - {\hat{L}}_{ρ} {\hat{w}}^{*} - φ {\hat{L}}_{ρ} {\hat{w}}^{b} - {\hat{L}}_{ρ} {\hat{w}}^{*} .$ Thus, $\begin{matrix} (47) & \sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + {x_{0}^{k + 1} - x_{0}^{k}}^{2} \leq \frac{δ}{η} β_{k} Φ_{k, k + 1}, \end{matrix}$ for all $k > \hat{k}$ . That is, $\begin{matrix} (48) & N + 1 \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + x_{0}^{k + 1} - x_{0}^{k} \\ \leq N + 1 \sqrt{N + 1} {\sum_{i = 1}^{N} {x_{i}^{k + 1} - x_{i}^{k}}^{2} + {x_{0}^{k + 1} - x_{0}^{k}}^{2}}^{\frac{1}{2}} \\ \leq 2 \sqrt{β_{k}} \sqrt{\frac{{N + 1}^{3} δ}{4 γ_{1} - γ_{2}} Φ_{k, k + 1}}, \end{matrix}$ for all $k > \hat{k}$ .

Since $a + b \geq 2 \sqrt{a b} a, b > 0$ , we have $\begin{matrix} (49) & 2 \sqrt{β_{k}} \sqrt{\frac{{N + 1}^{3} δ}{4 γ_{1} - γ_{2}} Φ_{k, k + 1}} \leq β_{k} + \frac{{N + 1}^{3} δ}{4 γ_{1} - γ_{2}} Φ_{k, k + 1} . \end{matrix}$

From (48) and (49), we obtain $\begin{matrix} (50) & N + 1 \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + x_{0}^{k + 1} - x_{0}^{k} \leq β_{k} + \frac{{N + 1}^{3} δ}{4 γ_{1} - γ_{2}} Φ_{k, k + 1} . \end{matrix}$

Summing up the above formula for $k = \hat{k} + 1, \dots, p$ , we have $\begin{matrix} (51) & \sum_{k = \hat{k} + 1}^{p} N + 1 \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + x_{0}^{k + 1} - x_{0}^{k} \leq \sum_{k = \hat{k} + 1}^{p} β_{k} + \frac{{N + 1}^{3} δ}{4 γ_{1} - γ_{2}} Φ_{k, k + 1} . \end{matrix}$

Notice that $ϕ \hat{L} {\hat{w}}^{p + 1} - \hat{L} {\hat{w}}^{*} > 0$ , it is easy to get $\begin{matrix} (52) & \sum_{k = \hat{k} + 1}^{p} \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + x_{0}^{k + 1} - x_{0}^{k} \\ \leq γ_{k} + \frac{{N + 1}^{3} δ}{4 γ_{1} - γ_{2}} φ {\hat{L}}_{ρ} {\hat{w}}^{\hat{k} + 1} - {\hat{L}}_{ρ} {\hat{w}}^{*} - φ {\hat{L}}_{ρ} {\hat{w}}^{p + 1} - {\hat{L}}_{ρ} {\hat{w}}^{*} \\ \leq γ_{k} + \frac{{N + 1}^{3} δ}{4 γ_{1} - γ_{2}} φ {\hat{L}}_{ρ} {\hat{w}}^{\hat{k} + 1} - {\hat{L}}_{ρ} {\hat{w}}^{*}, \end{matrix}$ where $\begin{matrix} (53) & γ_{k} = \sum_{i = 1}^{N} x_{i}^{\hat{k} + 1} - x_{i}^{\hat{k}} + x_{0}^{\hat{k} + 1} - x_{0}^{\hat{k}} . \end{matrix}$

Thus, $\begin{matrix} (54) & \sum_{k = 0}^{+ \infty} \sum_{i = 1}^{N} x_{i}^{k + 1} - x_{i}^{k} + x_{0}^{k + 1} - x_{0}^{k} \leq + \infty . \end{matrix}$

From Lemma 1, we get $\begin{matrix} (55) & \sum_{k = 0}^{+ \infty} w^{k + 1} - w^{k} < + \infty . \end{matrix}$

By Lemma 5, we conclude that the sequences $w^{k}$ converge to a critical point of $L \cdot$ .

4. Numerical Experiment

In this section, we present the results of a simple numerical example to verify the effectiveness of Algorithm 1. We consider the following compressive sense problem, which takes the following form: $\begin{matrix} (56) & \min λ {x_{1}}_{0} + λ {x_{2}}_{1} + \frac{1}{2} {A x_{0} - b}^{2}, \\ s . t . x_{i} - x_{0} = 0, \forall i = 1, 2, x_{0} \in X, \end{matrix}$ where $A$ is a $m \times n$ feature matrix, $b \in R^{m}$ is a response vector, and $λ$ is a regular parameter. In general, problem (56) is NP-hard. In order to overcome this difficulty, one may relax $l_{0}$ norm to the $l_{1 / 2}$ norm, considering the following nonconvex problem: $\begin{matrix} (57) & \min λ {x_{1}}_{1 / 2}^{1 / 2} + λ {x_{2}}_{1} + \frac{1}{2} {A x_{0} - b}^{2}, \\ s . t . x_{i} - x_{0} = 0, \forall i = 1, 2, x_{0} \in X . \end{matrix}$

Let $f_{1} x_{1} = λ {x_{1}}_{1 / 2}^{1 / 2}$ _, $f_{2} x_{2} = λ {x_{2}}_{1}$ and $g x_{0} = 1 / 2 {A x_{0} - b}^{2}$ , $X = R^{N}$ . We now focus on applying Algorithm 1 to solve problem (57) with the suitable parameters. The iterative processes are as follows: $\begin{matrix} (58) & \begin{matrix} x_{0}^{k + 1} = \underset{x_{0} \in X}{argmin} g x_{0} + \sum_{i = 1}^{N} y_{i}^{k}, x_{i}^{k} - x_{0} + \frac{ρ}{2} \sum_{i = 1}^{N} {x_{i}^{k} - x_{0}}^{2} + \frac{τ_{k}}{2} {x_{0} - z_{0}^{k}}^{2}, \\ x_{1}^{k + 1} = \underset{x_{1}}{argmin} f_{1} x_{1} + y_{1}^{k}, x_{1} - x_{0}^{k + 1} + \frac{ρ}{2} {x_{1} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} {x_{1} - z_{1}^{k}}^{2}, \\ x_{2}^{k + 1} = \underset{x_{2}}{argmin} f_{2} x_{2} + y_{2}^{k}, x_{2} - x_{0}^{k + 1} + \frac{ρ}{2} {x_{2} - x_{0}^{k + 1}}^{2} + \frac{τ_{k}}{2} {x_{2} - z_{2}^{k}}^{2}, \\ y_{1}^{k + 1} = y_{1}^{k} + ρ x_{1}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{1}^{k + 1} - z_{1}^{k}, \\ y_{2}^{k + 1} = y_{2}^{k} + ρ x_{2}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{2}^{k + 1} - z_{2}^{k} . \end{matrix} \end{matrix}$

Simplifying the procedures (58), we obtain the closed-form iterative formulas: $\begin{matrix} (59) & \begin{matrix} x_{0}^{k + 1} = {A^{T} A + τ_{k} + ρ N I}^{- 1} A^{T} b + ρ \sum_{i = 1}^{N} x_{i}^{k} + \sum_{i = 1}^{N} y_{i}^{k} + τ_{k} z_{0}^{k}, \\ x_{1}^{k + 1} = H \frac{τ_{k} z_{1}^{k} - y_{1}^{k} + ρ x_{0}^{k + 1}}{ρ + τ_{k}}, \frac{2 λ}{ρ + τ_{k}}, \\ x_{2}^{k + 1} = S \frac{τ_{k} z_{2}^{k} - y_{2}^{k} + ρ x_{0}^{k + 1}}{ρ + τ_{k}}, \frac{λ}{ρ + τ_{k}}, \\ y_{1}^{k + 1} = y_{1}^{k} + ρ x_{1}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{1}^{k + 1} - z_{1}^{k}, \\ y_{2}^{k + 1} = y_{2}^{k} + ρ x_{2}^{k + 1} - x_{0}^{k + 1} + τ_{k} x_{2}^{k + 1} - z_{2}^{k}, \end{matrix} \end{matrix}$ where $z_{0}^{k} = x_{0}^{k} + θ_{k} x_{0}^{k} - x_{0}^{k - 1}, z_{i}^{k} = x_{i}^{k} + θ_{k} x_{i}^{k} - x_{i}^{k - 1}, i = 1,2$ , $H \cdot, \cdot$ is the half shrinkage operator^[16], and $S A, \cdot$ indicates the soft shrinkage operator imposed on the entries of $A$ .

The experimental data are generated as follows. We use distributed computing toolbox in MATLAB, and the purpose is to achieve simple distributed computing. Suppose the feature matrix $A$ is standard normal distribution N (0, 1) m $^{*}$ n. Select sparse vector $a \in R^{n}$ from the N (0, 1) distribution. The parameters $b$ and $λ$ are set as $b_{i} = A_{i} a + ε$ and $λ = 0.01 A_{i}^{T} * b_{i}$ , where the noise vector $ε \in N 0,0.01 I .$ The variables $x_{0}, x_{i}, y_{i}$ were initialized to be zero. The primal residual is defined as $r^{k} = \sum_{i = 1}^{N} x_{i}^{k} .$ We employ $r^{k} \leq ε$ as the stopping criteria, where $ε = 10^{- 4}$ . The numerical results are reported in Table 1. We report the number of iterations (“Iter.”) and the computing time in seconds (“Time”) for the algorithms with different parameters under the dimension m = 2500, n = 1000.

Table 1

Comparison among two algorithms under different parameters.

$θ$	Iter.	Time (s)	$ρ$	$τ_{k}$
Case 1 (ADMM)0	191	22.725124	600	$3 l_{f}^{2} / ρ - 2 θ ρ - l_{f}^{2} - 2 θ$
Case 2 (IADMM) $1 / 2 - l_{f}^{2} + 1 / ρ$	137	16.41164
Case 3 (IADMM) $1 / 3 - l_{f}^{2} + 1 / ρ$	159	18.777039
Case 4 (ADMM)0	191	22.926591	500	$3 l_{f}^{2} / ρ - 2 θ ρ - l_{f}^{2} - 2 θ$
Case 5 (IADMM) $1 / 2 - l_{f}^{2} + 1 / ρ$	163	19.139843
Case 6 (IADMM) $1 / 3 - l_{f}^{2} + 1 / ρ$	178	21.54476

The values of $r^{k}$ with the iterations are plotted in Figures 1 and 2.

[figure(s) omitted; refer to PDF]

where $l_{f} = \max {l_{f_{i}}}_{i = 1,2, \dots, N}$ .

From Table 1, and Figures 1 and 2, we can see that ADMM converges more slowly than IADMM since “Iter.” of ADMM bigger than that of IADMM under the same conditions. Finally, numerical results show that the algorithm is feasible and effective.

5. Conclusion

In this paper, inspired by the application of nonconvex global consensus problem with regularization, we propose multiblock inertial ADMM algorithm for solving certain nonconvex global consensus problems. We have proven its convergence under some suitable conditions, and it turns out that any cluster point of the sequence generated by the proposed algorithm is a critical point. Numerical experiment is conducted to illustrate the effectiveness of the multiblock inertial ADMM (IADMM) algorithm. Its potential of the flexible multiblock inertial ADMM to analyze and design other types of nonconvex case, as well as a more thorough computational study, are topics of our further research.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (72071130; 71901145); National Social Science Fund Major Project of China (21&ZD200; 20&ZD199); Humanities and Social Sciences Research Project of the Ministry of Education (20YJC820030) and China Postdoctoral Science Foundation (2021M692047).

References

[1] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, "Distributed optimization and statistical learning via the alternating direction method of multipliers," Foundations and Trends in Machine Learning, vol. 3,DOI: 10.1561/2200000016, 2010.

[2] G. Li, T. K. Pong, "Global convergence of splitting methods for nonconvex composite optimization," SIAM Journal on Optimization, vol. 25 no. 4, pp. 2434-2460, DOI: 10.1137/140998135, 2015.

[3] R. Glowinski, A. Marroco, "Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires," Revue française d'automatique, informatique, recherche opérationnelle. Analyse numérique, vol. 9 no. 2, pp. 41-76, DOI: 10.1051/m2an/197509r200411, 1975.

[4] D. Gabay, B. Mercier, "A dual algorithm for the solution of nonlinear variational problems via finite element approximation," Computers & Mathematics with Applications, vol. 2 no. 1, pp. 17-40, DOI: 10.1016/0898-1221(76)90003-1, 1976.

[5] L. Yang, T. K. Pong, X. Chen, "Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction," SIAM Journal on Imaging Sciences, vol. 10 no. 1, pp. 74-110, DOI: 10.1137/15m1027528, 2017.

[6] M. Hong, Z. Q. Luo, M. Razaviyayn, "Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems," SIAM Journal on Optimization, vol. 26 no. 1, pp. 337-364, DOI: 10.1137/140990309, 2016.

[7] K. Guo, D. Han, D. Z. W. Wang, T. Wu, "Convergence of ADMM for multi-block nonconvex separable optimization models," Frontiers of Mathematics in China, vol. 12 no. 5, pp. 1139-1162, DOI: 10.1007/s11464-017-0631-6, 2017.

[8] C. H. Chen, R. H. Chan, S. Q. Ma, J. F. Yang, "Inertial proximal ADMM for linearly constrained separable convex optimization," SIAM Journal on Imaging Sciences, vol. 8 no. 4, pp. 2239-2267, DOI: 10.1137/15100463x, 2015.

[9] A. Moudafi, E. Elissabeth, "Approximate inertial proximal methods using the enlargement of maximal monotone operators," International Journal of Pure and Applied Mathematics, vol. 5 no. 3, pp. 283-299, 2003.

[10] M. Chao, Y. Zhang, J. Jian, "An inertial proximal alternating direction method of multipliers for nonconvex optimization," International Journal of Computer Mathematics, vol. 98, 2021.

[11] K. Kurdyka, "On gradients of functions definable in o-minimal structures," Annales de l'Institut Fourier, vol. 48 no. 3, pp. 769-783, DOI: 10.5802/aif.1638, 1998.

[12] S. Lojasiewicz, "Sur la geométrie semi-et sous-analytique," Annales de l'Institut Fourier, vol. 43 no. 5, pp. 1575-1595, DOI: 10.5802/aif.1384, 1993.

[13] S. Lojasiewicz, "Une propriété topologique des sous-ensembles analytiques réels," Les Équations Aux Dérivées Partielles, 1963.

[14] J. Bolte, A. Daniilidis, O. Ley, L. Mazet, "Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity," Transactions of the American Mathematical Society, vol. 362 no. 6, pp. 3319-3363, DOI: 10.1090/s0002-9947-09-05048-x, 2009.

Word count: 2319

Show less

Copyright © 2023 Yang Liu and Yazheng Dang. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

The alternating direction method of multipliers (ADMM) is one of the most powerful and successful methods for solving various nonconvex consensus problem. The convergence of the conventional ADMM (i.e., 2-block) for convex objective functions has been stated for a long time. As an accelerated technique, the inertial effect was used by many authors to solve 2-block convex optimization problem. This paper combines the ADMM and the inertial effect to construct an inertial alternating direction method of multipliers (IADMM) to solve the multiblock nonconvex consensus problem and shows the convergence under some suitable conditions. Simulation experiment verifies the effectiveness and feasibility of the proposed method.

Details

Title

Convergence Analysis of Multiblock Inertial ADMM for Nonconvex Consensus Problem

Author

Liu, Yang¹

; Dang, Yazheng²

¹ Department of Information Science and Technology, East China University of Political Science and Law, Shanghai 200237, China; China Institute for Smart Court, Shanghai Jiao Tong University, Shanghai 200030, China
² School of Management, University of Shanghai for Science and Technology, Shanghai 200093, China

Editor

Qiang Wu

Publication year

2023

Publication date

2023

Publisher

John Wiley & Sons, Inc.

ISSN

23144629

e-ISSN

23144785

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2023/4316267

ProQuest document ID

2798356378

Convergence Analysis of Multiblock Inertial ADMM for Nonconvex Consensus Problem

Jump to:

Full text

Abstract

Details

Suggested sources