1. Introduction
Classical optimal transport (OT) has received lots of attention recently, in particular in Machine Learning for tasks such as generative networks [1] or domain adaptation [2] to name a few. It generally relies on the Wasserstein distance, which builds an optimal coupling between distributions given a notion of distance between their samples. Yet, this metric cannot be used directly whenever the distributions lie in different metric spaces and lacks from potentially important properties, such as translation or rotation invariance of the supports of the distributions, which can be useful when comparing shapes or meshes [3,4]. In order to alleviate those problems, custom solutions have been proposed, such as [5], in which invariances are enforced by optimizing over some class of transformations, or [6], in which distributions lying in different spaces are compared by optimizing over the Stiefel manifold to project or embed one of the measures.
Apart from these works, another meaningful OT distance to tackle these problems is the Gromov–Wasserstein (GW) distance, originally proposed in [3,7,8]. It is a distance between metric spaces and has several appealing properties such as geodesics or invariances [8]. Yet, the price to be paid lies in its computational complexity, which requires solving a nonconvex quadratic optimization problem with linear constraints. A recent line of work tends to compute approximations or relaxations of the original problem in order to spread its use in more data-intensive machine learning applications. For example, Peyré et al. [9] rely on entropic regularization and Sinkhorn iterations [10], while recent methods impose coupling with low-rank constraints [11] or rely on a sliced approach [12] or on mini-batch estimators [13] to approximate the Gromov–Wasserstein distance. In Chowdhury et al. [4], the authors propose to partition the space and to solve the optimal transport problem between a subset of points before finding a coupling between all the points.
In this work, we study the subspace detour approach for Gromov–Wasserstein. This class of method was first proposed for the Wasserstein setting in Muzellec and Cuturi [14] and consists of (1) projecting the measures onto a wisely chosen subspace and finding an optimal coupling between them (2) and then constructing a nearly optimal plan of the measures on the whole space using disintegration (see Section 2.2). Our main contribution is to generalize the subspace detours approach on different subspaces and to apply it for the GW distance. We derive some useful properties as well as closed-form solutions of this transport plan between Gaussians distributions. From a practical side, we provide a novel closed-form expression of the one-dimensional GW problem that allows us to efficiently compute the subspace detours transport plan when the subspaces are one-dimensional. Illustrations of the method are given on a shape matching problem where we show good results with a cheaper computational cost compared to other GW-based methods. Interestingly enough, we also propose a separable quadratic cost for the GW problem that can be related with a triangular coupling [15], hence bridging the gap with Knothe–Rosenblatt (KR) rearrangements [16,17].
2. Background
In this section, we introduce all the necessary material to describe the subspace detours approach for classical optimal transport and relate it to the Knothe–Rosenblatt rearrangement. We show how to find couplings via the gluing lemma and measure disintegration. Then, we introduce the Gromov–Wasserstein problem for which we will derive the subspace detour in the next sections.
2.1. Classical Optimal Transport
Let be two probability measures. The set of couplings between and is defined as:
where and are the projections on the first and second coordinate (i.e., ), respectively, and # is the push forward operator, defined such that:2.1.1. Kantorovitch Problem
There exists several types of coupling between probability measures for which a non-exhaustive list can be found in [18] (Chapter 1). Among them, the so called optimal coupling is the minimizer of the following Kantorovitch problem:
(1)
with c being some cost function. The Kantorovitch problem (1) is known to admit a solution when c is non-negative and lower semi-continuous [19] (Theorem 1.7). When , it defines the so-called Wasserstein distance:(2)
When the optimal coupling is of the form with T, some deterministic map such that , T is called the Monge map.In one dimension, with atomless, the solution to (2) is a deterministic coupling of the form [19] (Theorem 2.5):
(3)
where is the cumulative distribution function of , and is the quantile function of . This map is also known as the increasing rearrangement map.2.1.2. Knothe–Rosenblatt Rearrangement
Another interesting coupling is the Knothe–Rosenblatt (KR) rearrangement, which takes advantage of the increasing rearrangement in one dimension by iterating over the dimension and using the disintegration of the measures. Concatenating all the increasing rearrangements between the conditional probabilities, the KR rearrangement produces a nondecreasing triangular map (i.e., , for all , , and for all j, is nondecreasing with respect to ), and a deterministic coupling (i.e., ) [18,19,20].
Carlier et al. [21] made a connection between this coupling and optimal transport by showing that it can be obtained as the limit of optimal transport plans for a degenerated cost:
where for all , , , and for all , . This cost can be recast as in [22] as , where . This formalizes into the following Theorem:([19,21]). Let μ and ν be two absolutely continuous measures on , with compact supports. Let be an optimal transport plan for the cost , let be the Knothe–Rosenblatt map between μ and ν, and the associated transport plan. Then, we have . Moreover, if are induced by transport maps , then converges in when t tends to zero to the Knothe–Rosenblatt rearrangement.
2.2. Subspace Detours and Disintegration
Muzellec and Cuturi [14] proposed another OT problem by optimizing over the couplings which share a measure on a subspace. More precisely, they defined subspace-optimal plans for which the shared measure is the OT plan between projected measures.
(Subspace-Optimal Plans [14] Definition 1). Let and let be a k-dimensional subspace. Let be an OT plan for the Wasserstein distance between and (with as the orthogonal projection on E). Then, the set of E-optimal plans between μ and ν is defined as .
In other words, the subspace OT plans are the transport plans of that agree on the subspace E with the optimal transport plan on this subspace. To construct such coupling , one can rely on the Gluing lemma [18] or use the disintegration of the measure as described in the following section.
2.2.1. Disintegration
Let and be measurable spaces, and the product measurable space. Then, for , we denote and as the marginals, where (respectively ) is the projection on Y (respectively Z). Then, a family is a disintegration of if for all , is a measure on Z, for all , is measurable and:
where is the set of continuous functions on X. We can note . K is a probability kernel if for all , . The disintegration of a measure actually corresponds to conditional laws in the context of probabilities. This concept will allow us to obtain measures on the whole space from marginals on subspaces.In the case where , which is our setting of interest, we have existence and uniqueness of the disintegration (see Box 2.2 of [19] or Chapter 5 of [23] for the more general case).
2.2.2. Coupling on the Whole Space
Let us note and as the disintegrated measures on the orthogonal spaces (i.e., such that and (if we have densities, .)). Then, to obtain a transport plan between the two originals measures on the whole space, we can look for another coupling between disintegrated measures and . In particular, two such couplings are proposed in [14], the Monge-Independent (MI) plan:
where we take the independent coupling between and for almost every , and the Monge-Knothe (MK) plan: where is an optimal plan between and for almost every . Muzellec and Cuturi [14] observed that MI is more adapted to noisy environments since it only computes the OT plan of the subspace. MK is more suited for applications where we want to prioritize some subspace but where all the directions still contain relevant information [14].This subspace detour approach can be of much interest following the popular assumption that two distributions on differ only in a low-dimensional subspace as in the Spiked transport model [24]. However, it is still required to find the adequate subspace. Muzellec and Cuturi [14] propose to either rely on a priori knowledge to select the subspace (by using, e.g., a reference dataset and a principal component analysis) or to optimize over the Stiefel manifold.
2.3. Gromov–Wasserstein
Formally, the Gromov–Wasserstein distance allows us to compare metric measure spaces (mm-space), triplets and , where and are complete separable metric spaces and and are Borel probability measures on X and Y [8], respectively, by computing:
where L is some loss on . It has actually been extended to other spaces by replacing the distances by cost functions and , as, e.g., in [25]. Furthermore, it has many appealing properties such as having invariances (which depend on the costs).Vayer [26] notably studied this problem in the setting where X and Y are Euclidean spaces, with and or . In particular, let and , and the inner-GW problem is defined as:
(4)
For this problem, a closed form in one dimension can be found when one of the distributions admits a density w.r.t. the Lebesgue measure:([26] Theorem 4.2.4). Let , with μ being absolutely continuous with respect to the Lebesgue measure. Let be the cumulative distribution function and the anti-cumulative distribution function. Let and . Then, an optimal solution of (4) is achieved either by or by .
3. Subspace Detours for GW
In this section, we propose to extend subspace detours from Muzellec and Cuturi [14] with Gromov–Wasserstein costs. We show that we can even take subspaces of different dimensions and still obtain a coupling on the whole space using the Independent or the Monge–Knothe coupling. Then, we derive some properties analogously to Muzellec and Cuturi [14], as well as some closed-form solutions between Gaussians. We also provide a new closed-form expression of the inner-GW problem between one-dimensional discrete distributions and provide an illustration on a shape-matching problem.
3.1. Motivations
First, we adapt the definition of subspace optimal plans for different subspaces. Indeed, since the GW distance is adapted to distributions that have their own geometry, we argue that if we project on the same subspace, then it is likely that the resulting coupling would not be coherent with that of GW. To illustrate this point, we use as a source distribution one moon of the two moons dataset and obtain a target by rotating by an angle of (see Figure 1). As the GW with is invariant with respect to isometries, the optimal coupling is diagonal, as recovered on the left side of the figure. However, when choosing one subspace to project both the source and target distributions, we completely lose the optimal coupling between them. Nonetheless, by choosing one subspace for each measure more wisely (using here the first component of the principal component analysis (PCA) decomposition), we recover the diagonal coupling. This simple illustration underlines that the choice of both subspaces is important. A way of choosing the subspaces could be to project on the subspace containing the more information for each dataset using, e.g., PCA independently on each distribution. Muzellec and Cuturi [14] proposed to optimize the optimal transport cost with respect to an orthonormal matrix with a projected gradient descent, which could be extended to an optimization over two orthonormal matrices in our context.
By allowing for different subspaces, we obtain the following definition of subspace optimal plans:
Let , , E be a k-dimensional subspace of and F a -dimensional subspace of . Let be an optimal transport plan for between and (with (resp. ) the orthogonal projection on E (resp. F)). Then, the set of -optimal plans between μ and ν is defined as .
Analogously to Muzellec and Cuturi [14] (Section 2.2), we can obtain from a coupling on the whole space by either defining the Monge–Independent plan or the Monge–Knothe plan where OT plans are taken with some OT cost, e.g., .
3.2. Properties
Let and and denote:
(5)
the Gromov–Wasserstein problem restricted to subspace optimal plans (2). In the following, we show that Monge–Knothe couplings are optimal plans of this problem, which is a direct transposition of Proposition 1 in [14].Let and , , , , where is an optimal coupling between and , and for , almost every , is an optimal coupling between and . Then we have:
Let , then:
However, for a.e. ,
by definition of the Monge–Knothe coupling. By integrating with respect to , we obtain: Therefore, is optimal for subspace optimal plans. □The key properties of that we would like to keep are its invariances. We show in two particular cases that we conserve them on the orthogonal spaces (since the measure on is fixed).
Let , , , .
For or , (5) is invariant with respect to isometries of the form (resp. ) with an isometry on (resp. an isometry on ) with respect to the corresponding cost ( or ).
We propose a sketch of the proof. The full proof can be found in Appendix A.1. Let , let be an isometry w.r.t , and let be defined as such for all , .
By using Lemma 6 of [27], we show that . Hence, for all , there exists such that . By disintegrating with respect to and using the properties of the pushforward, we can show that:
Finally, by taking the infimum with respect to , we find:□
3.3. Closed-Form between Gaussians
We can also derive explicit formulas between Gaussians in particular cases. Let , , two Gaussian measures with and . As previously, let and be k and dimensional subspaces, respectively. Following Muzellec and Cuturi [14], we represent in an orthonormal basis of and in an orthonormal basis of , i.e., . Now, let us denote the following:
as the Schur complement of with respect to . We know that the conditionals of Gaussians are Gaussians and that their covariances are the Schur complements (see, e.g., [28,29]).For , we have for now no certainty that the optimal transport plan is Gaussian. Let denote the set of Gaussians in . By restricting the minimization problem to Gaussian couplings, i.e., by solving:
(6)
Salmona et al. [30] showed that there is a solution with , and(7)
where , and is of the form .By combining the results of Muzellec and Cuturi [14] and Salmona et al. [30], we obtain the following closed-form for Monge–Knothe couplings:
Suppose and . For the Gaussian-restricted GW problem (6), a Monge–Knothe transport map between and is, for all , where:
with being an optimal transport map between and (of the form (7)), an optimal transport map between and , and C satisfies:
See Appendix A.2.1. □
Suppose that , , and and let be an optimal transport map between and (of the form (7)). We can derive a formula for the Monge–Independent coupling for the inner-GW problem and the Gaussian restricted GW problem.
where with
where is an optimal transport map, either for the inner-GW problem or the Gaussian restricted problem.
See Appendix A.2.2. □
3.4. Computation of Inner-GW between One-Dimensional Empirical Measures
In practice, computing the Gromov–Wasserstein distance from samples of the distributions is costly. From a computational point of view, the subspace detour approach provides an interesting method with better computational complexity when choosing 1D subspaces. Moreover, we have the intuition than the GW problem between measures lying on smaller dimensional subspaces has a better sample complexity than between the original measures, as it is the case for the Wasserstein distance [31,32].
Below, we show that when both E and F are one-dimensional subspaces, then the resulting GW problem between the projected measures can be solved in linear time. This will rely on a new closed-form expression of the GW problem in 1D. Vayer et al. [12] provided a closed-form for GW with in one dimension between discrete measures containing the same number of points and with uniform weights. However, in our framework, the 1D projection of may not have uniform weights, and we also would like to be able to compare distributions with different numbers of points. We provide in the next proposition a closed-form expression for the inner-GW problem between any unidimensional discrete probability distributions:
Consider the n probability simplex. For a vector , we denote as the vector with values sorted decreasingly, i.e., . Let with . Suppose that and . Consider the problem:
(8)
Then, there exists such that γ is an optimal solution of (8) where is the North-West corner rule defined in Algorithm 1. As a corollary, an optimal solution of (8) can be found in .Algorithm 1 North-West corner rule |
|
Let . Then:
However, , and , so this does not depend on . Moreover . Hence, the problem (8) is equivalent to (in terms of the OT plan), which is also equivalent to solving or equivalently:(9)
We have two cases to consider: If , we have to solve . Since the points are sorted, the matrix satisfies the Monge property [33]:(10)
To see this, check that:(11)
In this case, the North-West corner rule defined in Algorithm 1 is known to produce an optimal solution to the linear problem (9) [33]. If , then changing to concludes. □We emphasize that this result is novel and generalizes [12] in the sense that the distributions do not need to have uniform weights and the same number of points. I addition, Theorem 2 is not directly applicable to this setting since it requires having absolutely regular distributions, which is not the case here. Both results are, however, related, as the solution obtained by using the NW corner rule on the sorted samples is the same as that obtained by considering the coupling obtained from the quantile functions. The previous result could also be used to define tractable alternatives to GW in the same manner as the Sliced Gromov–Wasserstein [12].
3.5. Illustrations
We use the Python Optimal Transport (POT) library [34] to compute the different optimal transport problems involved in this illustration. We are interested here in solving a 3D mesh registration problem, which is a natural application of Gromov–Wasserstein [3] since it enjoys invariances with respect to isometries such as permutations and can also naturally exploit the topology of the meshes. For this purpose, we selected two base meshes from the
4. Triangular Coupling as Limit of Optimal Transport Plans for Quadratic Cost
Another interesting property derived in Muzellec and Cuturi [14] of the Monge–Knothe coupling is that it can be obtained as the limit of classic optimal transport plans, similar to Theorem 1, using a separable cost of the form:
with and as an orthonormal basis of .However, this property is not valid for the classical Gromov–Wasserstein cost (e.g., or ) as the cost is not separable. Motivated by this question, we ask ourselves in the following if we can derive a quadratic optimal transport cost for which we would have this property.
Formally, we derive a new quadratic optimal transport problem using the Hadamard product. We show that this problem is well-defined and that it has interesting properties such as invariance with respect to axis. We also show that it can be related to a triangular coupling in a similar fashion than the classical optimal transport problem with the Knothe–Rosenblatt rearrangement.
4.1. Construction of the Hadamard–Wasserstein Problem
In this part, we define the “Hadamard–Wasserstein” problem between and as:
(12)
where ⊙ is the Hadamard product (element-wise product). This problem is different than the Gromov–Wasserstein problem in the sense that we do not compare intradistance anymore bur rather the Hadamard products between vectors of the two spaces (in the same fashion as the classical Wasserstein distance). Hence, we need the two measures to belong in the same Euclidean space. Let us note L as the cost defined as:(13)
We observe that it coincides with the inner-GW (4) loss in one dimension. Therefore, by Theorem 2, we know that we have a closed-form solution in 1D.4.2. Properties
First, we derive some useful properties of (12) which are usual for the regular Gromov–Wasserstein problem. Formally, we show that the problem is well defined and that it is a pseudometric with invariances with respect to axes.
Let .
-
The problem (12) always admits a minimizer.
-
is a pseudometric (i.e., it is symmetric, non-negative, , and it satisfies the triangle inequality).
-
is invariant to reflection with respect to axes.
Let ,
is a continuous map, therefore, L is less semi-continuous. Hence, by applying Lemma 2.2.1 of [26], we observe that is less semi-continuous for the weak convergence of measures.
Now, as is a compact set (see the proof of Theorem 1.7 in [19] for the Polish space case and of Theorem 1.4 for the compact metric space) and is less semi-continuous for the weak convergence, we can apply the Weierstrass theorem (Memo 2.2.1 in [26]), which states that (12) always admits a minimizer.
See Theorem 16 in [25].
For invariances, we first look at the properties that must be satisfied by T in order to have: where .
We find that because, denoting as the canonical basis, we have:
which implies that:However, implies , and therefore:
If we take for T the reflection with respect to axis, then it satisfies well. Moreover, it is a good equivalence relation, and therefore, we have a distance on the quotient space.
loses some properties compared to . Indeed, it is only invariant with respect to axes, and it can only compare measures lying in the same Euclidean space in order for the distance to be well defined. Nonetheless, we show in the following that we can derive some links with triangular couplings in the same way as the Wasserstein distance with KR.
Indeed, the cost L (13) is separable and reduces to the inner-GW loss in 1D, for which we have a closed-form solution. We can therefore define a degenerated version of it:
(14)
with , such as for all , and for all , , and . We denote the problem (12) with the degenerated cost (14). Therefore, we will be able to decompose the objective as: and to use the same induction reasoning as [21].Then, we can define a triangular coupling different from the Knothe–Rosenblatt rearrangement in the sense that each map will not be nondecreasing. Indeed, following Theorem 2, the solution of each 1D problem:
is either or . Hence, at each step , if we disintegrate the joint law of the k first variables as , the optimal transport map will be the solution of:We now state the main theorem, where we show that the limit of the OT plans obtained with the degenerated cost will be the triangular coupling we just defined.
Let μ and ν be two absolutely continuous measures on such that , and with compact support. Let be an optimal transport plan for , let be the alternate Knothe–Rosenblatt map between μ and ν as defined in the last paragraph, and let be the associated transport plan. Then, we have . Moreover, if are induced by transport maps , then .
See Appendix B.2. □
However, we cannot extend this Theorem to the subspace detour approach. Indeed, by choosing with an orthonormal basis of , then we project on E (respectively on ), which is generally different from (respectively ).
4.3. Solving Hadamard–Wasserstein in the Discrete Setting
In this part, we derive formulas to solve numerically (12). Let , , , , and two discrete measures in . The Hadamard Wasserstein problem (12) becomes in the discrete setting:
with . As denoted in [9], if we note: then we have: where ⊗ is defined as:We show in the next proposition a decomposition of , which allows us to compute this tensor product more efficiently.
Let , where . Let us note , , , , and and . Then:
First, we can start by writing:
We cannot directly apply proposition 1 from [9] (as the third term is a scalar product), but by performing the same type of computation, we obtain: with and Finally, we have:□
From this decomposition, we can compute the tensor product with a complexity of using only multiplications of matrices (instead of for a naive computation).
For the degenerated cost function (14), we just need to replace X and Y by and in the previous proposition.
To solve this problem numerically, we can use the conditional gradient algorithm (Algorithm 2 in [41]). This algorithm only requires to compute the gradient:
at each step and a classical OT problem. This algorithm is more efficient than solving the quadratic problem directly. Moreover, while it is a non-convex problem, it actually converges to a local stationary point [42].On Figure 3, we generated 30 points of 2 Gaussian distributions, and computed the optimal coupling of for several t. These points have the same uniform weight. We plot the couplings between the points on the second row, and between the projected points on their first coordinate on the first row. Note that for discrete points, the Knothe–Rosenblatt coupling amounts to sorting the points with respect to the first coordinate if there is no ambiguity (i.e., ) as it comes back to perform the optimal transport in one dimension [43] (Remark 2.28). For our cost, the optimal coupling in 1D can either be the increasing or the decreasing rearrangement. We observe on the first row of Figure 3 that the optimal coupling when t is close to 0 corresponds to the decreasing rearrangement, which corresponds well to the alternate Knothe–Rosenblatt map we defined in Section 4.2. It underlines the results provided in Theorem 3.
5. Discussion
We proposed in this work to extend the subspace detour approach to different subspaces, and to other optimal transport costs such as Gromov–Wasserstein. Being able to project on different subspaces can be useful when the data are not aligned and do not share the same axes of interest, as well as when we are working between different metric spaces as it is the case, for example, with graphs. However, a question that arises is how to choose these subspaces. Since the method is mostly interesting when we choose one-dimensional subspaces, we proposed to use a PCA and to project on the first directions for data embedded in Euclidean spaces. For more complicated data such as graphs, we projected onto the Fiedler vector and obtained good results in an efficient way on a 3D mesh registration problem. More generally, Muzellec and Cuturi [14] proposed to perform a gradient descent on the loss with respect to orthonormal matrices. This approach is non-convex and is only guaranteed to converge to a local minimum. Designing such an algorithm, which would minimize alternatively between two transformations in the Stiefel manifold, is left for future works.
The subspace detour approach for transport problem is meaningful whenever one can identify subspaces that gather most of the information from the original distributions, while making the estimate more robust and with a better sample complexity as far as dimensions are lower. On the computational complexity side, and when we have only access to discrete data, the subspace detour approach brings better computational complexity solely when the subspaces are chosen as one dimensional. Indeed, otherwise, we have the same complexity for solving the subspace detour and solving the OT problem directly (since the complexity only depends on the number of samples). In this case, the 1D projection often gives distinct values for all the samples (for continuous valued data) and hence the Monge–Knothe coupling is exactly the coupling in 1D. As such, information is lost on the orthogonal spaces. It can be artificially recovered by quantizing the 1D values (as experimented in practice in [14]), but the added value is not clear and deserves broader studies. If given absolutely continuous distributions wrt. the Lebesgue measure, however, this limit does not exist but comes with the extra cost of being able to compute efficiently the projected measure onto the subspace, which might require discretization of the space and is therefore not practical in high dimensions.
We also proposed a new quadratic cost that we call Hadamard–Wasserstein, which allows us to define a degenerated cost for which the optimal transport plan converges to a triangular coupling. However, this cost loses many properties compared to or , for which we are inclined to use these problems. Indeed, while is a quadratic cost, it uses a Euclidean norm between the Hadamard product of vectors and requires the two spaces to be the same (in order to have the distance well defined). A work around in the case and with would be to “lift” the vectors in into vectors in with padding as it is proposed in [12] or to project the vectors in on as in [6]. Yet, for some applications where only the distance/similarity matrices are available, a different strategy still needs to be found. Another concern is the limited invariance properties (only with respect to axial symmetry symmetry in our case). Nevertheless, we expect that such a cost can be of interest in cases where invariance to symmetry is a desired property, such as in [44].
Methodology, C.B., N.C., T.V., F.S., and L.D.; software, C.B. and N.C.; writing—original draft, C.B., N.C., T.V., F.S., and L.D. All authors have read and agreed to the published version of the manuscript.
This research wad funded by project DynaLearn from Labex CominLabs and Région Bretagne ARED DLearnMe. N.C. acknowledges fundings from the ANR OTTOPIA AI chair (ANR-20-CHIA-0030). T.V. was supported in part by the AllegroAssai ANR project (ANR-19-CHIA-0009) and by the ACADEMICS grant of the IDEXLYON, project of the Université de Lyon, PIA operated by ANR-16-IDEX-0005.
The FAUST dataset might be found at
The authors declare no conflict of interest.
The following abbreviations are used in this manuscript:
OT | Optimal Transport |
GW | Gromov–Wasserstein |
KR | Knothe–Rosenblatt |
MI | Monge–Independent |
MK | Monge–Knothe |
PCA | Principal Component Analysis |
POT | Python Optimal Transport |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figure 1. From left to right: Data (moons); OT plan obtained with GW for [Forumla omitted. See PDF.]; Data projected on the first axis; OT plan obtained between the projected measures; Data projected on their first PCA component; OT plan obtained between the the projected measures.
Figure 2. Three-dimensional mesh registration. (First row) source and target meshes, color code of the source, ground truth color code on the target, result of subspace detour using Fiedler vectors as subspace. (Second row) After recalling the expected ground truth for ease of comparison, we present results of different Gromov–Wasserstein mappings obtained with metrics based on adjacency, heat kernel, and geodesic distances.
Figure 3. Degenerated coupling. On the first row, the points are projected on their first coordinate and we plot the optimal coupling. On the second row, we plot the optimal coupling between the original points.
Appendix A. Subspace Detours
Appendix A.1. Proofs
We first deal with
From Lemma 6 of Paty and Cuturi [
Now, for all
For
By integrating with respect to
Now, we show that
Now, by taking the infimum with respect to
For the inner product case, we can do the same proof for linear isometries on
Appendix A.2. Closed-Form between Gaussians
Let
Let
We represent
Appendix A.2.1. Quadratic GW Problem
For
In that case, they showed that an optimal solution is of the form
Since the problem is translation invariant, we can always solve the problem between the centered measures.
In the following, we suppose that
We know that the Monge–Knothe transport map will be a linear map
Actually,
First, we have well
We have:
As
Now, we still have to check the last two equations. First:
For the last equation:
Then,
Appendix A.2.2. Closed-Form between Gaussians for Monge–Independent
Suppose is
For the Monge–Independent plan,
Let us assume that
Hence:
We also have:
Finally, we find:
By taking orthogonal bases
To check it, just expand the terms and see that
Appendix B. Knothe–Rosenblatt
Appendix B.1. Properties of (12)
In a slightly more general setting, let
See Theorem 3.1 in [
Appendix B.2. Proof of Theorem 3
We first recall a useful theorem.
(Theorem 2.8 in Billingsley [
The following proof is mainly inspired by the proof of Theorem 1 in [
Let
First, let us denote
Part 1:
First, let us notice that:
Moreover, as
In our case, we have
By denoting
However,
Part 2:
We know that for any
Therefore, we have the following inequality,
We can substract the first term and factorize by
By dividing by
Now, the 2 terms depend only on
We can rewrite the previous Equation (
Now, we will assume at first that the marginals of
Now, we still have to show that the marginals of
First, we can use the projections
Part 3:
Now, we can proceed the same way by induction. Let
For this part of the proof, we rely on [
Hence, we have:
As before, by subtracting the first term, dividing by
For the continuity, we can apply [
By taking the limit
We can now disintegrate with respect to
Part 4:
Therefore, we have well
References
1. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. Proceedings of the International Conference on Machine Learning, PMLR; Sydney, Australia, 6–11 August 2017; pp. 214-223.
2. Courty, N.; Flamary, R.; Tuia, D.; Rakotomamonjy, A. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell.; 2016; 39, pp. 1853-1865. [DOI: https://dx.doi.org/10.1109/TPAMI.2016.2615921] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27723579]
3. Mémoli, F. Gromov–Wasserstein distances and the metric approach to object matching. Found. Comput. Math.; 2011; 11, pp. 417-487. [DOI: https://dx.doi.org/10.1007/s10208-011-9093-5]
4. Chowdhury, S.; Miller, D.; Needham, T. Quantized Gromov–Wasserstein. Proceedings of the Machine Learning and Knowledge Discovery in Databases, Research Track—European Conference, ECML PKDD 2021; Bilbao, Spain, 13–17 September 2021; Proceedings, Part III Oliver, N.; Pérez-Cruz, F.; Kramer, S.; Read, J.; Lozano, J.A. Springer: Berlin/Heidelberg, Germany, 2021; Volume 12977, pp. 811-827. [DOI: https://dx.doi.org/10.1007/978-3-030-86523-8_49]
5. Alvarez-Melis, D.; Jegelka, S.; Jaakkola, T.S. Towards optimal transport with global invariances. Proceedings of the The 22nd International Conference on Artificial Intelligence and Statistics, PMLR; Naha, Japan, 16–18 April 2019; pp. 1870-1879.
6. Cai, Y.; Lim, L.H. Distances between probability distributions of different dimensions. arXiv; 2020; arXiv: 2011.00629
7. Mémoli, F. On the use of Gromov-Hausdorff Distances for Shape Comparison. Proceedings of the 4th Symposium on Point Based Graphics, PBG@Eurographics 2007; Prague, Czech Republic, 2–3 September 2007; Botsch, M.; Pajarola, R.; Chen, B.; Zwicker, M. Eurographics Association: Switzerland, Geneve, 2007; pp. 81-90. [DOI: https://dx.doi.org/10.2312/SPBG/SPBG07/081-090]
8. Sturm, K.T. The space of spaces: Curvature bounds and gradient flows on the space of metric measure spaces. arXiv; 2012; arXiv: 1208.0434
9. Peyré, G.; Cuturi, M.; Solomon, J. Gromov–Wasserstein averaging of kernel and distance matrices. Proceedings of the International Conference on Machine Learning, PMLR; New York, NY, USA, 19–24 June 2016; pp. 2664-2672.
10. Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst.; 2013; 26, pp. 2292-2300.
11. Scetbon, M.; Peyré, G.; Cuturi, M. Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs. arXiv; 2021; arXiv: 2106.01128
12. Vayer, T.; Flamary, R.; Courty, N.; Tavenard, R.; Chapel, L. Sliced Gromov–Wasserstein. Adv. Neural Inf. Process. Syst.; 2019; 32, pp. 14753-14763.
13. Fatras, K.; Zine, Y.; Majewski, S.; Flamary, R.; Gribonval, R.; Courty, N. Minibatch optimal transport distances; analysis and applications. arXiv; 2021; arXiv: 2101.01792
14. Muzellec, B.; Cuturi, M. Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019; Vancouver, BC, Canada, 8–14 December 2019; Wallach, H.M.; Larochelle, H.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E.B.; Garnett, R. 2019; pp. 6914-6925.
15. Bogachev, V.I.; Kolesnikov, A.V.; Medvedev, K.V. Triangular transformations of measures. Sb. Math.; 2005; 196, 309. [DOI: https://dx.doi.org/10.1070/SM2005v196n03ABEH000882]
16. Knothe, H. Contributions to the theory of convex bodies. Mich. Math. J.; 1957; 4, pp. 39-52. [DOI: https://dx.doi.org/10.1307/mmj/1028990175]
17. Rosenblatt, M. Remarks on a multivariate transformation. Ann. Math. Stat.; 1952; 23, pp. 470-472. [DOI: https://dx.doi.org/10.1214/aoms/1177729394]
18. Villani, C. Optimal Transport: Old and New; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; Volume 338.
19. Santambrogio, F. Optimal transport for applied mathematicians. Birkäuser NY; 2015; 55, 94.
20. Jaini, P.; Selby, K.A.; Yu, Y. Sum-of-Squares Polynomial Flow. Proceedings of the 36th International Conference on Machine Learning, PMLR; Long Beach, CA, USA, 9–15 June 2019; pp. 3009-3018.
21. Carlier, G.; Galichon, A.; Santambrogio, F. From Knothe’s transport to Brenier’s map and a continuation method for optimal transport. SIAM J. Math. Anal.; 2010; 41, pp. 2554-2576. [DOI: https://dx.doi.org/10.1137/080740647]
22. Bonnotte, N. Unidimensional and Evolution Methods for Optimal Transportation. Ph.D. Thesis; Université Paris-Sud: Paris, France, 2013.
23. Ambrosio, L.; Gigli, N.; Savaré, G. Gradient Flows: In Metric Spaces and in the Space of Probability Measures; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008.
24. Niles-Weed, J.; Rigollet, P. Estimation of wasserstein distances in the spiked transport model. arXiv; 2019; arXiv: 1909.07513
25. Chowdhury, S.; Mémoli, F. The gromov–wasserstein distance between networks and stable network invariants. Inf. Inference A J. IMA; 2019; 8, pp. 757-787. [DOI: https://dx.doi.org/10.1093/imaiai/iaz026]
26. Vayer, T. A Contribution to Optimal Transport on Incomparable Spaces. Ph.D. Thesis; Université de Bretagne Sud: Vannes, France, 2020.
27. Paty, F.P.; Cuturi, M. Subspace robust wasserstein distances. Proceedings of the 36th International Conference on Machine Learning, PMLR; Long Beach, CA, USA, 9–15 June 2019; pp. 5072-5081.
28. Rasmussen, C.E. Gaussian processes in machine learning. Summer School on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2003; pp. 63-71.
29. Von Mises, R. Mathematical Theory of Probability and Statistics; Academic Press: Cambridge, MA, USA, 1964.
30. Salmona, A.; Delon, J.; Desolneux, A. Gromov–Wasserstein Distances between Gaussian Distributions. arXiv; 2021; arXiv: 2104.07970
31. Weed, J.; Bach, F. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli; 2019; 25, pp. 2620-2648. [DOI: https://dx.doi.org/10.3150/18-BEJ1065]
32. Lin, T.; Zheng, Z.; Chen, E.; Cuturi, M.; Jordan, M. On projection robust optimal transport: Sample complexity and model misspecification. Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR; Virtual, 13–15 April 2021; pp. 262-270.
33. Burkard, R.E.; Klinz, B.; Rudolf, R. Perspectives of Monge Properties in Optimization. Discret. Appl. Math.; 1996; 70, pp. 95-161. [DOI: https://dx.doi.org/10.1016/0166-218X(95)00103-X]
34. Flamary, R.; Courty, N.; Gramfort, A.; Alaya, M.Z.; Boisbunon, A.; Chambon, S.; Chapel, L.; Corenflos, A.; Fatras, K.; Fournier, N. et al. POT: Python Optimal Transport. J. Mach. Learn. Res.; 2021; 22, pp. 1-8.
35. Bogo, F.; Romero, J.; Loper, M.; Black, M.J. FAUST: Dataset and evaluation for 3D mesh registration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Columbus, OH, USA, 23–28 June 2014; pp. 3794-3801.
36. Fiedler, M. Algebraic connectivity of graphs. Czechoslov. Math. J.; 1973; 23, pp. 298-305. [DOI: https://dx.doi.org/10.21136/CMJ.1973.101168]
37. Hagberg, A.; Swart, P.; Chult, D.S. Exploring Network Structure, Dynamics, and Function Using NetworkX; Technical Report Los Alamos National Lab. (LANL): Los Alamos, NM, USA, 2008.
38. Wu, J.P.; Song, J.Q.; Zhang, W.M. An efficient and accurate method to compute the Fiedler vector based on Householder deflation and inverse power iteration. J. Comput. Appl. Math.; 2014; 269, pp. 101-108. [DOI: https://dx.doi.org/10.1016/j.cam.2014.03.018]
39. Xu, H.; Luo, D.; Carin, L. Scalable Gromov–Wasserstein learning for graph partitioning and matching. Adv. Neural Inf. Process. Syst.; 2019; 32, pp. 3052-3062.
40. Chowdhury, S.; Needham, T. Generalized spectral clustering via Gromov–Wasserstein learning. Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR; Virtual, 13–15 April 2021; pp. 712-720.
41. Vayer, T.; Courty, N.; Tavenard, R.; Flamary, R. Optimal transport for structured data with application on graphs. Proceedings of the 36th International Conference on Machine Learning, PMLR; Long Beach, CA, USA, 9–15 June 2019; pp. 6275-6284.
42. Lacoste-Julien, S. Convergence rate of frank-wolfe for non-convex objectives. arXiv; 2016; arXiv: 1607.00345
43. Peyré, G.; Cuturi, M. Computational optimal transport: With applications to data science. Found. Trends® Mach. Learn.; 2019; 11, pp. 355-607. [DOI: https://dx.doi.org/10.1561/2200000073]
44. Nagar, R.; Raman, S. Detecting approximate reflection symmetry in a point set using optimization on manifold. IEEE Trans. Signal Process.; 2019; 67, pp. 1582-1595. [DOI: https://dx.doi.org/10.1109/TSP.2019.2893835]
45. Billingsley, P. Convergence of Probability Measures; John Wiley & Sons: Hoboken, NJ, USA, 2013.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In the context of optimal transport (OT) methods, the subspace detour approach was recently proposed by Muzellec and Cuturi. It consists of first finding an optimal plan between the measures projected on a wisely chosen subspace and then completing it in a nearly optimal transport plan on the whole space. The contribution of this paper is to extend this category of methods to the Gromov–Wasserstein problem, which is a particular type of OT distance involving the specific geometry of each distribution. After deriving the associated formalism and properties, we give an experimental illustration on a shape matching problem. We also discuss a specific cost for which we can show connections with the Knothe–Rosenblatt rearrangement.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 Laboratoire de Mathématiques de Bretagne Atlantique, Université Bretagne Sud, CNRS UMR 6205, 56000 Vannes, France;
2 ENS Lyon, CNRS UMR 5668, LIP, 69342 Lyon, France;
3 Department of Computer Science, Université Bretagne Sud, CNRS UMR 6074, IRISA, 56000 Vannes, France;
4 IMT Atlantique, CNRS UMR 6285, Lab-STICC, 29238 Brest, France;