Full Text

Turn on search term navigation

1. Introduction

This paper considers the problem of adaptively reconstructing a monotonically increasing function^F†from imperfect pointwise observations of this function. In the statistical literature, the problem of estimating a monotone function is commonly known as isotonic regression, and it assumed that the observed data consist of noisy pointwise evaluations of^F†. However, we consider this problem under assumptions that differ from the standard formulation, and these differences motivate our algorithmic approach to the problem. To be concrete, our two motivating examples are that

^F†(x):=_PΞ∼μ[g(Ξ)≤x]

is the cumulative distribution function (CDF) of a known real-valued function g of a random variableΞwith known distributionμ, or that

^F†(x):=sup(g,μ)∈A_PΞ∼μ[g(Ξ)≤x]

is the supremum of a family of such CDFs over some classA. We assume that we have access to a numerical optimisation routine that can, for each x and some given numerical parameters q (e.g., the number of iterations or other convergence tolerance parameters), produce a numerical estimate or observationG(x,q)of^F†(x); furthermore, we assume thatG(x,q)≤^F†(x)is always true, i.e., the numerical optimisation routine always under-estimates the true optimum value, and that the positive error^F†(x)−G(x,q)can be controlled to some extent through the choice of the optimisation parameters q, but remains essentially influenced by randomness in the optimisation algorithm for each x. The assumptionG(x,q)≤^F†(x) is for example coherent with either Equation (1), which may be approached by increasing the number of samples (say q) in a Monte Carlo simulation, or Equation (2), which is a supremum over a set that may be explored only partially by the algorithm.

A single observationG(x,q)yields some limited information about^F†(x); a key limitation is that one may not even know a priori how accurateG(x,q)is. Naturally, one may repeatedly evaluate G at x, perhaps with different values of the optimisation parameters q, in order to more accurately estimate^F†(x). However, a key observation is that a suite of observationsG(_xi,_qi),i=1,⋯,I, contains much more information than simply estimates of^F†(_xi),i=1,⋯,I, and this information can and must be used. For example, suppose that the values_{(G(_xi,_qi))i=1I}are not increasing, e.g., because

G(_xi,_qi)>G(_x^i′,_q^i′)and_xi<_x^i′.

Such a suite of observations would be inconsistent with the axiomatic requirement that^F†is an increasing function. In particular, while the observation at_ximay be relatively good or bad on its own merits, the observationG(_x^i′,_q^i′)at_x^i′, which violates monotonicity, is in some sense “useless” as it gives no better lower bound on^F†(_x^i′)than the observation at_xidoes. The observation at_x^i′is thus a good candidate for repetition with more stringent optimisation parameters q—and this is not something that could have been known without comparing it to the rest of the data set.

The purpose of this article is to leverage this and similar observations to define an algorithm for the reconstruction of the function^F†, repeating old observations of insufficient quality and introducing new ones as necessary. The principal parameter in the algorithm is an “exchange rate”E that quantifies the degree to which the algorithm prefers to have a few high-quality evaluations versus many poor-quality evaluations. Our approach is slightly different from classical isotonic (or monotonic) regression, which is understood as the least-squares fitting of an increasing function to a set of points in the plane. The latter problem is uniquely solvable and its solution can be constructed by the pool adjacent violators algorithm (PAVA) extensively studied in Barlow et al. [1]. This algorithm consists of exploring the data set from left to right until the monotonicity condition is violated, and replacing the corresponding observations by their average while back-averaging to the left if needed to maintain monotonicity. Extensions to the PAVA have been developed by de Leeuw et al. [2] to consider non least-squares loss functions and repeated observations, by Tibshirani et al. [3] to consider “nearly isotonic” or “nearly convex” fits, and by Jordan et al. [4] to consider general loss functions and partially ordered data sets. Useful references on isotonic regression also include Robertson et al. [5] and Groeneboom and Jongbloed [6].

The remainder of this paper is structured as follows. Section 2 presents the problem description and notation, after which the proposed adaptive algorithm for the reconstruction of^F† is presented in Section 3. We demonstrate the convergence properties of the algorithm in Section 3.2 and study its performance on several analytically tractable test cases in Section 4. Section 5 details the application of the algorithm to a challenging problem of the form Equation (2) drawn from aerodynamic design. Some closing remarks are given in Section 6.

2. Notation and Problem Description

In the following, the “ground truth” response function that we wish to reconstruct is denoted^F†:[a,b]→Rand has inputsx∈[a,b]⊂R. It is assumed that^F†is monotonically increasing and non-constant on[a,b]. In contrast,G:[a,b]×_R+→Rdenotes the numerical process used to obtain an imperfect pointwise observation y of^F†(x)at some pointx∈[a,b]for some numerical parameterq∈_R+. Here, on a heuristic level,q>0stands for the “quality” of the noisy evaluationG(x,q).

The main aim of this paper is to show the effectiveness of the proposed algorithm for the adaptive reconstruction of^F†, which could be continuous or not, from imperfect pointwise observationsG(_xi,_qi)of^F†, where we are free to choose_xi+1and_qi+1adaptively-based on_xj,_qj, andG(_xj,_qj)forj≤i

First, we associate with I imperfect pointwise observations_{{_xi,_yi:=G(_xi,_qi)}i=1I}⊂[a,b]×R, positive numbers_{{_qi}i=1I}⊂_R+which we will call qualities. The quality_qiquantifies the confidence we have in the pointwise observation_yiof^F†(_xi)using the numerical processG(_xi,_qi). The higher this value, the greater the confidence. We divide this quality as the product of two different numbers_ciand_ri,_qi=_ci×_ri, with the following definitions:

Consistency_ci∈{0,1}: This describes the fact that two successive points must be monotonically consistent with respect to each other. That is, when one takes two input values_x2>_x1, one should have_y2≥_y1as y must be monotonically increasing. There is no consistency associated with the very first data point as it does not have any predecessor.
Reliability_ri∈_R+: This describes how confident we are about the numerical value. Typically, it will be related to some error estimator if one is available, or the choice of optimisation parameters. It is expected that the higher the reliability, the closer the pointwise observation is to the true value, on average.

Typically, if the observation_yi+1=G(_xi+1,_qi+1)is consistent with regard to the observation_yi=G(_xi,_qi)where_xi+1>_xi, the quality_qi+1associated with_yi+1will be equal to_qi+1=_ri+1∈_R+*since_ci+1=1in this case. If the value is not consistent, we have_qi+1=_ri+1×_ci+1=0. Finally, if x=athere is no notion of consistency as there is no point preceding it. Thereby, the quality associated with this point is only equal to its reliability.

Moreover, we associate to these pointwise observations a notion of area, illustrated in Figure 1 and defined as follows. Consider two consecutive points_xiand_xi+1with their respective observations_yiand_yi+1, the area_aifor these two points is

_ai=(_xi+1−_xi)×(_yi+1−_yi).

Thus, we can define a vectora=_{{_ai}i=1I−1}which contains all the computed areas for the whole dataset. In addition, we can assure that if we take two points_x1and_x2>_x1with_y1=^F†(_x1)and_y2=^F†(_x2)—namely the error at these point is equal to zero, the graph of ground truth function^F†must lie in the rectangular area spanned by the two points(_x1,^F†(_x1))and(_x2,^F†(_x2)).

To adopt a conservative point of view, we choose as the approximating function F of^F†a piecewise constant interpolation function, say:

F(x)=∑i=1I−1_yi _{1[_xi,_xi+1)}(x),

where_1Idenotes the indicator function of the intervalI. We do not want this interpolation function to overestimate the true function^F†as one knows that the numerical estimate in our case always underestimates the ground truth function^F†(x) . See Figure 1 for an illustration of this choice, which can be viewed as a worst-case approach. Indeed, this chosen interpolation function is the worst possible function underestimating^F†given two points_x1and_x2and their respective observations_y1and_y2.

3. Reconstruction Algorithms

The reconstruction algorithm that we propose, Algorithm 1, is driven to produce a sequences of reconstructions that converges to^F†by following a principle of area minimisation: we associate to the discrete data set_{{_xi,_yi}i=1I}⊂[a,b]×R a natural notion of area (3) as explained above, and seek to drive this area towards zero. The motivation behind this objective is in Proposition 2 which states that the area converges to 0 as more points are added to the data set. However, the objective of minimising the area is complicated by the fact that evaluations of^F†are imperfect. Therefore, a key user-defined parameter in the algorithm isE∈(0,∞), which can be thought of as an “exchange rate” that quantifies to what extent the algorithm prefers to redo poor-quality evaluations of the target function versus driving the area measure to zero.

3.1. Algorithm

The main algorithm is organized as follows, starting from^I(0)≥2points and a dataset that is assumed to be consistent at the initial stepn=0. It goes through N iterations, where N is either fixed a priori, or obtained a posteriori once a stopping criterion is met. Note that_qnewstands for the quality of a newly generated observation_ynewfor any new point_xnewintroduced by the algorithm. The latter is driven by the user-defined “exchange rate”Eas explained just above. At each step n, the algorithm computes the weighted area^WA(n)as the minimum of the quality times the sum of the areas of the data points:

^WA(n)=_q−(n)×^A(n),

where

_q−(n)=min1≤i≤^I(n){_qi(n)},^A(n)=∑i=1^I(n)−1_ai(n),

_ai(n) is the area computed by Equation (3) at step n (see also Equation (9)), and^I(n)is the number of data points. Then it is divided into two parts according to the value of^WA(n)compared to E.

Algorithm 1: Adaptive algorithm to reconstruct a monotonically increasing function^F†

Input:^I(0)≥2,_{{_xi(0),_yi(0),_qi(0)}i=1^I(0)}andE.

Output:_{{_xi(N),_yi(N),_qi(N)}i=1^I(N)}with^I(N)≥^I(0).

Initialization:

Get the worst quality point and its index:

_q−(0)=min1≤i≤^I(0){_qi(0)};
_i−(0)=argmin1≤i≤^I(0){_qi(0)}.

Compute the area of each pair of data points:_ai(0)=(_xi+1(0)−_xi(0))×(_yi+1(0)−_yi(0)).

Get the biggest rectangle and its index:

_a+(0)=max1≤i≤^I(0)−1{_ai(0)};
_i+(0)=argmax1≤i≤^I(0)−1{_ai(0)}.

Define the weighted area at stepn=0as^WA(0)=_q−(0)×∑i=1^I(0)−1_ai(0) .

If^WA(n)<E, then the algorithm aims at increasing the quality_q−(n)of the worst data point (the one with the lowest quality) with index_i−(n)=_{argmin1≤i≤^I(n)}{_qi(n)}at step n. It stores the corresponding old value_yold, searches for a new value_ynewby improving successively the quality of this very point, and stops when_ynew>_yold.
If^WA(n)≥E, then the algorithm aims at driving the total area^A(n)to zero. In that respect, it identifies the biggest rectangle

_a+(n)=max1≤i≤^I(n)−1{_ai(n)}

and its index

_i+(n)=argmax1≤i≤^I(n)−1{_ai(n)}

and adds a new point_xnewat the middle of this biggest rectangle. Then, it computes a new data value_ynew=G(_xnew,_qnew)with a new quality_qnew.

In both cases, the numerical parameters_qnew(for example several iterations, or the size of a sampling set or a population) are arbitrary and any value can be chosen in practice each time a new point_xnewis added to the dataset. They can be increased arbitrarily as well each time such a new point has to be improved. Indeed, the numerical parameters q of the optimisation routine we have access to can be increased as much as desired, and increasing them will improve the estimatesG(x,q)of the true function^F†(x)uniformly in x; see Assumption 1. The algorithm then verifies the consistency of the dataset by checking the quality of each point. If there is any inconsistent point, the algorithm computes a new value until obtaining consistency by improving successively the corresponding reliability. This is achieved in a finite number of steps starting from an inconsistent point and exploring the dataset from the left to the right.

Finally, the algorithm updates the quality vector_{{_qi(n+1)}i=1^I(n+1)}, the area vector_{{_ai(n+1)}i=1^I(n+1)}, the worst quality_q−(n+1)and the index_i−(n+1)of the corresponding point, the biggest rectangle_a+(n+1)and its index_i+(n+1), and then the new weighted area^WA(n+1).

3.2. Proof of Convergence

We denote by^I(n)the number of data points, and_{{_xi(n),_yi(n),_qi(n)}i=1^I(n)}the positions of the data points, the observations given by the optimization algorithm at these positions, and the qualities associated with the optimization algorithm at the step n of Algorithm 1. For eachi=1,⋯,^I(n)−1, we define_si(n)=[_xi(n),_xi+1(n)[⊂[a,b]and the vector containing all rectangle areas_{{_ai(n)}i=1^I(n)−1}by:

_ai(n)=(_xi+1(n)−_xi(n))×(_yi+1(n)−_yi(n)).

The pointwise observation_yi(n)=G(_xi(n),_qi(n))is thus associated with the quality_qi(n)∈_R+ , which quantifies the confidence we have in this observation as outlined in the problem description in Section 2. This number can represent the inverse error achieved by the optimization algorithm, for example, or the number of iterations, or the number of individuals in a population, or any other numerical parameter pertaining to this optimization process. The higher it is, the closer the observation is to the true target value. Therefore we consider the following assumption on the numerical process G.

Assumption 1.

G(x,q)converges to^F†(x)asq→+∞uniformly inx∈[a,b], that is:

∀ϵ>0,∃Q>0suchthat∀q≥Q,∀x∈[a,b],G(x,q)−^F†(x)≤ϵ.

Moreover, we can guarantee that:

∀x∈[a,b],∀q∈_R+,G(x,q)≤^F†(x).

That is, the optimisation algorithm will always underestimate the true value^F†(x). In this way, one can model the relationship between the numerical estimate G and the true value^F†as:

∀x∈[a,b],∀q∈_R+,G(x,q)=^F†(x)−ϵ(x,q),

whereϵis a positive random variable. These assumptions imply some robustness and stability of the algorithm we use.

In the following, we will assume that^I(0)≥2. That is, we have at least two data points at the beginning of the reconstruction algorithm. Also among these points, we have one point atx=aand another one atx=b. Moreover, we will assume that the initial dataset is consistent. Since Algorithm 1 recomputes the inconsistent points at all steps, we can also consider in the following that any new numerical observation is actually consistent. Also, we need to guarantee that the weighted area^WA(n)will permanently oscillate aboutEas the iteration step n is increasing; this is the purpose of Assumption 3 below as shown in the subsequent Proposition 1. From these properties it will then be shown that Algorithm 1 is convergent, as stated in Theorem 1.

Assumption 2.Any new numerical value obtained by Algorithm 1 is consistent.

Assumption 3.

_q−(n)→+∞asn→∞.

Within Assumption 2 all points have a consistency of 1, and thereforeq=r>0the reliability. Besides, one hasG(_xi(n),_qi(n))≤G(_xi+1(n),_qi+1(n)), i.e.,_yi(n)≤_yi+1(n)for all points i and steps n. We finally define the sequence of piecewise constant reconstruction functions^F(n)as follows.

Definition 1.

For eachx∈[a,b], we define the reconstructing function^F(n)at step n as:

^F(n)(x)=∑i=1^I(n)−1_yi(n) _{1_si(n)}(x),

and^F(n)(_x^I(n)(n))=^F(n)(b)=_y^I(n)(n).

Now let

^E+:={n∈N;^WA(n)≥E},^E−:={n∈N;^WA(n)<E},

which are such that^E+∪^E−=Nand^E+∩^E−=∅. In order to prove the convergence (in a sense to be given) of Algorithm 1, we first need to establish the following intermediate results, Proposition 1, Proposition 2, and Proposition 3. They clarify the behaviour of the sequence^WA(n)when points are added to the dataset and the largest area_a+(n) is divided into four parts at each iteration step n; see Figure 2.

Proposition 1.

^E+is infinite.

Proof.

Let us assume that^E+is finite:∃Nsuch that∀n≥N,n∈^E−. Therefore we are in the situation^WA(n)<E, the minimum quality_q−(n)of the data goes to infinity, and the total area^A(n)is modified although the evaluation points_{{_xi(n)}i=1^I(n)}and their number^I(n)are unchanged; thus they are independent of n. Repeating this step yields

limn→∞^A(n)=∑i=1I−1(_xi+1−_xi)(^F†(_xi+1)−^F†(_xi))=A>0

since^F†is monotonically increasing and non-constant on[a,b], and Assumption 1 is used. Consequently^WA(n)→+∞asn→∞, that is^WA(n)≥E∀n≥_N1for some_N1, which is a contradiction. □

The set^E+is therefore of the form

^E+=⋃k≥1⟦_mk,_nk⟧,

where

⟦_mk,_nk⟧:={n∈N;_mk≤n≤_nk}.

Let us introduce the strictly increasing applicationφ:N→Nsuch thatφ(p)is the p^th element of^E+(in increasing order), and⟦_mk,_nk⟧=φ(⟦_pk+1,_pk+1⟧). p is the counter of the elements of^E+, and n is the corresponding iteration number.

Proposition 2.

Let^I(φ(p))=^I(φ(0))+p. Then

^A(φ(p))=∑i=1^I(φ(p))−1_ai(φ(p))=O1p

asp→∞, and^A(n)→0asn→0.

Proof.

Letk≥1andn=φ(p)∈⟦_mk,_nk⟧, wherep∈⟦_pk+1,_pk+1⟧. Let^A(n) be given by Equation (6),_a+(n) be given y Equation (7), and_i+(n) be given by Equation (8). At iterationn+1one has:

_xi(n+1)=_xi(n)for1≤i≤_i+(n),12_{x_i+(n)(n)}+_{x_i+(n)+1(n)}fori=_i+(n)+1,_xi−1(n)for_i+(n)+2≤i≤^I(n+1).

Also_yi(n+1)≤_yi+1(n+1)for1≤i≤^I(n+1)−1. One may check that_a+(n)=2_{a_i+(n)(n+1)}+2_{a_i+(n)+1(n+1)} (see Figure 2) and therefore:

^A(n+1)=^A(n)−_a+(n)+_{a_i+(n)(n+1)}+_{a_i+(n)+1(n+1)}=^A(n)−12_a+(n).

Besides^A(n)≤(^I(n)−1)_a+(n)so that one has:

^A(n+1)≤^A(n)−^A(n)2(^I(n)−1)≤^A(n)2(^I(n)−1)−12(^I(n)−1),

or:

^A(φ(p)+1)≤^A(φ(p))2(^I(φ(p))−1)−12(^I(φ(p))−1).

At this stage two situations arise:

eitherp∈⟦_pk+1,_pk+1−1⟧, in which caseφ(p)+1=φ(p+1);
orp=_pk+1, in which case by our algorithm^A(n)is kept constant fromn=_nk+1ton=_mk+1; that is^A(_nk+1)=^A(_mk+1), or:

^{A(φ(_pk+1)+1)}=^{A(φ(_pk+1+1))}.

The choice of k being arbitrary, one concludes that Equation (14) also reads∀p∈N:

^A(φ(p+1))≤^A(φ(p))2(^I(φ(p))−1)−12(^I(φ(p))−1)≤^A(φ(p))2(^I(φ(0))+p−1)−12(^I(φ(0))+p−1).

Thus:

^A(φ(p))≤^A(φ(1))∏i=1p−12(^I(φ(0))+i−1)−12(^I(φ(0))+i−1)≤^A(φ(1))∏i=1p−11+αi1+βi,

lettingα=^I(φ(0))−32andβ=^I(φ(0))−1.

However,

∑i=1plog1+αi=α∑i=1p1i+_Cp″

where_limp→∞ _Cp″=^C″, and

∑i=1p1i=logp+γ+_ϵp′,

whereγis the Euler constant and_limp→∞ _ϵp′=0. Consequently:

∑i=1p−1log1+αi−∑i=1p−1log1+βi=(α−β)log(p−1)+_Cp′=(α−β)logp+log1−1p+_Cp′=log1p+_Cp,

sinceα−β=−12; again_Cpand_Cp′are sequences with constant limits_limp→∞ _Cp=Cand_limp→∞ _Cp′=^C′. Therefore,

∏i=1p−11+αi1+βi=Cp(1+_ϵp)

whereCis a constant, and_limp→∞ _ϵp=0. One also concludes that^A(n), which is either kept constant or equal to^A(φ(p)), converges to 0 asn→∞. Hence the claimed results hold. □

Proposition 3.

^E−is infinite.

Proof.

Let us assume that^E−is finite:∃Nsuch that∀n≥N,n∈^E+. Therefore we are in the situation^WA(n)≥E>0, andφ(n)has the formφ(n)=n−_n0,n≥Nfor some_n0∈N. From Proposition 2:

^A(n−_n0)=O1n,

thus^A(n)→0and^WA(n)→0asn→∞since_q−(n)is kept unchanged, which is a contradiction. □

We now provide three results on the convergence of Algorithm 1. As is to be expected, the algorithm can only be shown to converge uniformly when the target response function^F†is sufficiently smooth; otherwise, the convergence is at best pointwise or in mean.

Theorem 1

(Algorithm convergence). Assume that^F†is strictly increasing. Then, for any choice ofE>0, Algorithm 1 is convergent in the following senses:

If^F†is piecewise continuous on[a,b], then_limn→∞ ^F(n)(x)=^F†(x)at all pointsx∈[a,b]where^F†is continuous;
If^F†is continuous on[a,b], then convergence holds uniformly:∥^F(n)−^F† _∥∞→n→∞0.

Proof.

LetE>0. We know from Propositions 1 and 3 that^WA(n)will oscillate aboutEin the iterating process asn→∞, while_limn→∞ _q−(n)=+∞from Assumption 3. Furthermore, let

^Δ(n):=sup1≤i≤^I(n)−1_xi+1(n)−_xi(n).

Assuming for example that for some j,_sj(n)=[_xj(n),_xj+1(n))is never divided in two in the iteration process and is thus independent of n, it turns out that_aj(n)→(_xj+1−_xj)(^F†(_xj+1)−^F†(_xj))>0asn→∞, which is impossible because^A(n)goes to 0 asn→∞from Proposition 2. Therefore there exists somem∈^N*(depending on n) such that^Δ(n+m)≤12^Δ(n); also the sequence^Δ(n)is decreasing, hence^Δ(n)→0asn→∞.

Now letx∈[_xi(n),_xi+1(n)). Then:

^F(n)(x)−^F†(x)=G(_xi(n),_qi(n))−^F†(x)≤G(_xi(n),_qi(n))−^F†(_xi(n))+^F†(_xi(n))−^F†(x).

However,_xi(n)→xasn→∞because^Δ(n)→0; thus if^F†is continuous at x, the second term on the right hand side above goes to 0 asn→∞. However, if^F†is continuous everywhere on[a,b], it is in addition uniformly continuous on[a,b]by Heine’s theorem, and the second term goes to 0 asn→∞uniformly on[a,b]. Finally, invoking Assumption 1, the first term on the right hand side above also tends to 0 asn→∞. This completes the proof. □

Proposition 4

(Convergence in mean). Let^F†:[a,b]→Rbe piecewise continuous. Then Algorithm 1 is convergent in mean in the sense that

∥^F(n)−^F† _∥1→n→∞0.

Proof.

We can check that the sequence^F(n)is monotone. Indeed, if^WA(n)<E, then by construction we have

^F(n+1)(x)−^F(n)(x)≥_{y_i−(n)(n+1)}−_{y_i−(n)(n)}_{1_s−(n)}(x)≥0

where_s−(n)=[_{x_i−(n)(n)},_{x_i−(n)+1(n)}). However, if^WA(n)>E, then consistency implies that

^F(n+1)(x)−^F(n)(x)≥_{y_i+(n)+1(n+1)}−_{y_i+(n)(n)}_{1_s+(n+1)}(x)≥0

where_s+(n+1)=[_{x_i+(n)+1(n+1)},_{x_i+(n)+2(n+1)}). The claim now follows from the monotone convergence theorem and the fact that^F(0)is integrable. □

4. Test Cases

To show the effectiveness of Algorithm 1, we try it on two cases, in which^F†is a continuous function and a discontinuous function respectively. For both cases, the error between the numerical estimate and the ground truth function is modelled as a random variable following a Log-normal distribution. That is,

∀x∈[a,b],ϵ(x)∼LogN(μ(x),^σ2),

with^σ2=1andμ(x)is chosen asP[0≤ϵ(x)≤0.1·^F†(x)]=0.9. Thus, the meanμis different for eachx∈[a,b].

As we have access to the ground truth function and for validation purpose, the quality value associated woth a numerical point is the inverse of the relative error. Moreover, we assume that the initial points are consistent.

For illustrative purposes, we set the parameterE=15for the examples considered below.

4.1.^F†Is a Continuous Function

First, consider the function^F†∈^C0([1,2],[1,2])defined as follows:

^F†(x)=_F1†(x)if x∈[1,32],_F2†(x)if x∈[32,2],

with

_F1†(x)=_a1exp(^x3)+_b1,_F2†(x)=_a2exp(^(3−x)3)+_b2,

where:

_a1=−12(exp(1)−exp(27/8)),_b1=3−2exp(19/8)2(1−exp(19/8)),_a2=−_a1,_b2=2_a1exp(27/8)+_b1.

The target function^F†and the reconstructions^F(n) obtained through the algorithm for several values of the step n are shown in Figure 3. For each n, the reconstruction^F(n)is increasing and the initial points are consistent. The ∞-norm and 1-norm of the error appear to converge to zero with approximate rates−0.512and−0.534respectively.

4.2.^F†Is a Discontinuous Function

Now, consider the discontinuous function^F†defined as follows:

^F†(x)=_F1†if x∈[1,32],_F2†if x∈(32,2],

where_F1†and_F2† are given by (16), and:

_a1=−12(exp(1)−exp(27/8)),_b1=3−2exp(19/8)2(1−exp(19/8)),_a2=25(exp(8)−exp(27/8)),_b2=10−8exp(37/8)5(1−exp(37/8)).

Here,^F†is piecewise continuous on[1,32]and]32,2]. In this case, one can apply Proposition 4. The target function^F†and the reconstructions^F(n) obtained through the algorithm for several values of the step n are shown in Figure 4. Observe that the approximation quality, as measured by the ∞-norm of the error^F†−^F(n), quite rapidly saturates and does not converge to zero. This is to be expected for this discontinuous target^F†, since closeness of two functions in the supremum norm mandates that they have approximately the same discontinuities in exactly the same places. The 1-norm error, in contrast, appears to converge at the rate−0.561.

Regarding computational cost, the number of calls to the numerical model is lower when^F†is continuous than when it is discontinuous. For both examples above and for the same number of data points, the number of evaluations of the numerical model (analytical formula in the present case) in the discontinuous case is about six times higher than the number of evaluations in the continuous case. This is because the algorithm typically adds more points near discontinuities and the effort of making them consistent increases the number of calls to the model.

4.3. Influence of the User-Defined ParameterE

We consider the case in which^F† is discontinuous, as in Section 4.2. We will show the influence of the choice of the parameterEon the reconstruction function^F(n).

4.3.1. CaseE≪1

Let us consider the caseE=¹⁰⁻⁴≪1 . This choice corresponds to the case where one wishes to split over redo the worst quality point. This can be seen on Figure 5 where the worst quality is almost constant over 100 steps while the sum of areas strongly decreases; see Figure 5e and Figure 5f respectively. At each step, the algorithm is adding a new point by splitting the biggest rectangle. One can note on Figure 5f that the minimum of the quality is not constant. It means that when the algorithm added a new data point, the point with the worst quality was not consistent any more and had to be recomputed. In summary, in this case, we obtain more points but with lower quality values.

4.3.2. CaseE≫1

We now consider the caseE=¹⁰⁴≫1 . This choice corresponds to the case where one wishes to redo the worst quality point over split. This can be seen on Figure 6 where the sum of areas stays more or less the same over 100 steps while the minimum of the quality surges; see Figure 6f and Figure 6e respectively. There is no new point. The algorithm is only redoing the worst quality point to improve it. To sum up, we obtain fewer points with higher quality values.

5. Application to Optimal Uncertainty Quantification 5.1. Optimal Uncertainty Quantification

In the optimal uncertainty quantification paradigm proposed by Owhadi et al. [7] and further developed by, e.g., Sullivan et al. [8] and Han et al. [9], upper and lower bounds on the performance of an incompletely specified system are calculated via optimisation problems. More concretely, one is interested in the probability that a system, whose output is a function^g†:X→Rof inputsΞdistributed according to a probability measure^μ†on an input spaceX, satisfies^g†(Ξ)≤x, where x is a specified performance threshold value. We emphasise that although we focus on a scalar performance measure, the inputΞmay be a multivariate random variable.

In practice,^μ†and^g†are not known exactly; rather, it is known only that(^μ†,^g†)∈Afor some admissible subsetAof the product space of all probability measures onXwith the set of all real-valued functions onX. Thus, one is interested in

_P̲A:=(x)inf(μ,g)∈A_PΞ∼μ[g(Ξ)≤x]and_P¯A(x):=sup(μ,g)∈A_PΞ∼μ[g(Ξ)≤x].

The inequality

0≤_P̲A(x)≤_PΞ∼^μ†[^g†(Ξ)≤x]≤_P¯A(x)≤1

is, by definition, the tightest possible bound on the quantity of interest_PΞ∼^μ†[^g†(Ξ)≤x]that is compatible with the information used to specifyA. Thus, the optimal UQ perspective enriches the principles of worst- and best-case design to account for distributional and functional uncertainty. We concentrate our attention hereafter, without loss of generality, on the least upper bound_P¯A(x).

Remark 1.

The main focus of this paper is the dependency of_P¯A(x)on x. In practice, an underlying task is, for any individual x, reducing the calculation of_P¯A(x) to a tractable finite-dimensional optimisation problem. Central enabling results here are the reduction theorems of (Owhadi et al. [7], Section 4), which loosely speaking, say that if, for each g,{μ∣(μ,g)∈A}is specified by a system of m equality or inequality constraints on expected values of arbitrary test functions under μ, then for the determination of_P¯A(x)it is sufficient to consider only distributions μ that are convex combinations of at mostm+1point masses; the optimisation variables are then the m independent weights andm+1locations inX of these point masses. If μ factors as a product of distributions (i.e., Ξ is a vector with independent components), then this reduction theorem applies componentwise.

As a function of the performance threshold x,_P¯A(x)is an increasing function, and so it is potentially advantageous to determine_P¯A(x)jointly for a wide range of x values using the algorithm developed above. Indeed, determining_P¯A(x)for many values of x, rather than just one value, is desirable for multiple reasons:

Since numerical optimisation to determine_P¯A(x)may be affected by errors, computing several values of_P¯A(x)could lead to validate their consistency as the functionx↦_P¯A(x)must be increasing;
The function_P¯A(x)can be discontinuous. Thus, by computing several values of_P¯A(x), one can highlight potential discontinuities and can identify key threshold values ofx↦_P¯A(x).

5.2. Test Case

For the application of Algorithm 1 to OUQ, we study the robust shape optimization of the two-dimensional RAE2822 airfoil [10] (Appendix A6) using ONERA’s CFD software elsA [11]. The following example is taken from Dumont et al. [12]. The shape of the original RAE2822 is altered using four bumps located at four different locations:5%,20%,40%, and60% of the way along the chord c (see Figure 7). These bumps are characterised by B-splines functions.

The lift-to-drag ratio_Cl _Cd of the RAE2822 wing profile (see Figure 8) at Reynolds NumberRe=6.5×¹⁰⁶, Mach number_M∞=0.729and angle of attackα=2.31°is chosen as the performance function^g†with inputsΞ=(_Ξ1,_Ξ2,_Ξ3,_Ξ4), where_{(_Ξi)i=1⋯4} is the amplitude of each bump. They will be considered as random variables over their respective range given in Table 1.

The corresponding flow values are the ones described in test case#6 together with the wall interferences corrections formulas given in [13] (Chapter 6) and in [14] (Section 5.1). Moreover, we will assume that_{(_Ξi)i=1⋯4}are mutually independent. An ordinary Kriging procedure has been chosen to build a metamodel (or response surface) of^g†, which is identified with the actual response function^g†in the subsequent analysis. A tensorised grid of 9 equidistributed abscissas for each parameter is used. The model is then based onN=⁹⁴=6561observations. In that respect, a Gaussian kernel

K(Ξ,^Ξ′)=exp−12∑i=14^{(_Ξi−_Ξi′)2} _γi2

has been chosen, whereΞ=(_Ξ1,_Ξ2,_Ξ3,_Ξ4)and^Ξ′=(_Ξ1′,_Ξ2′,_Ξ3′,_Ξ4′)are inputs of the function^g†, and whereγ=(_γ1,_γ2,_γ3,_γ4)are the parameters of the kernel. These parameters are chosen to minimize the variance between the ground truth data defined by the N observations and their Kriging metamodel^g†. The responce surfaces in the(_Ξ1,_Ξ3)plan for two values of(_Ξ2,_Ξ4) are shown in Figure 9.

One seeks to determine_P¯A(x):=_supμ∈A _PΞ∼μ[^g†(Ξ)≤x], where the admissible setAis defined as follows:

A=(g,μ)|Ξ∈X=_X1×_X2×_X3×_X4g:X↦Yisknownequalto^g†μ=_μ1⊗_μ2⊗_μ3⊗_μ4_EΞ∼μ[g(Ξ)]=LD.

A priori, finding_P¯A(x)is not computationally tractable because it requires a search over a infinite-dimensional space of probability measures defined byA . Nevertheless, as described briefly in Remark 1, it has been shown in Owhadi et al. [7] that this optimisation problem can be reduced to a finite-dimensional one, where now the probability measures are products of finite convex combinations of Dirac masses.

Remark 2.

The ground truth law^μ† of each input variable given in Table 1 is only used to compute the expected value_EΞ∼μ[g(Ξ)]=LD. This expected value is computed with¹⁰⁴samples.

Remark 3.

The admissible setA from (17) can be understood as follows:

One knows the range of each input parameter_{(_Ξi)i=1,⋯,4};
g is exactly known asg=^g†;
_{(_Ξi)i=1,⋯,4}are independent;
One only knows the expected value of g:_EΞ∼μ[g(Ξ)].

The optimisation problem of determining_P¯A(x) for each chosen x was solved using the Differential Evolution algorithm of Storn and Price [15] within the mystic optimisation framework [16]. Ten iterations of Algorithm 1 have been performed usingE=1×¹⁰⁴. The evolution of_P¯A(x) as function of the iteration count, n, is shown in Figure 10. Atn=0 —see Figure 10a—two consistent points are present atx=57.51andx=67.51. At this step,^WA(0)=35289. As^WA(0)≥E, at next stepn=1 , the algorithm adds a new point at the middle of the biggest rectangle—see Figure 10b and Figure 11b. Aftern=10steps, eight points are now present in total with a minimum quality increasing from 5000 to 11,667 and with a total area decreasing from7.05to0.84 ; see Figure 11a and Figure 11b respectively.

The number of iterations in this complex numerical experiment has been limited to 10 because obtaining new or improved data points consistent throughout the optimization algorithm may take up to two days (wall-clock time on a personal computer equipped with an Intel Core i5-6300HQ processor with 4 cores and 6 MB cache memory) for one single point. This running time is increased further for data points of higher quality. Nevertheless, this experiment shows that the proposed algorithm can be used for real-world examples in an industrial context. 6. Concluding Remarks

In this paper we have developed an algorithm to reconstruct a monotonically increasing function such as the cumulative distribution function of a real-valued random variable, or the least upper bound of the performance criterion of a system as a function of its performance threshold. In particular, this latter setting has relevance to the optimal uncertainty quantification (OUQ) framework of [7] we have in mind for applications to real-world incompletely specified systems. The algorithm uses imperfect pointwise evaluations of the target function, subject to partially controllable one-sided errors, to direct further evaluations either at new sites in the function’s domain or to improve the quality of evaluations at already-evaluated sites. It allows for some flexibility at targeting either strategy through a user-defined “exchange rate” parameter, yielding an approximation of the target function with a few high-quality points or alternatively more lower-quality points. We have studied its convergence properties and have applied it to several examples: known target functions that are either continuous and discontinuous, and a performance function for aerodynamic design of a well-documented standard profile in the OUQ setting.

Algorithm 1 is reminiscent of the classical PAVA approach to isotonic regression that applies to statistical inference with order restrictions. Examples of its use can be found in shape constrained or parametric density problems as illustrated in e.g., [6]. Possible improvements and extensions of our algorithm include weighting the areas_ai(n)as they are summed up to form the total weighted area^WA(n)driving the iterative process, in order to optimally enforce both the addition of “steps”_si(n)in the reconstruction function^F(n)of Definition 1, and the improvement of their “heights”_yi(n). This could be achieved considering for example the following alternative definition_i+(n)=arg_maxi{(^I(n)−i−1)_ai(n)}in Algorithm 1, which results in both adding a step to the_i+(n)-th current one and possibly improving all subsequent evaluations_yi(n+1),i>_i+(n). We may further envisage to adapt the ideas elaborated in this research to the reconstruction of convex functions by extending the notion of consistency. These perspectives shall be considered in future works.

	Range	Law
Bump 1:_Ξ1	[−0.0025c; +0.0025c]	_μ1†: Beta law withα=6,β=6
Bump 2:_Ξ2	[−0.0025c; +0.0025c]	_μ2†: Beta law withα=2,β=2
Bump 3:_Ξ3	[−0.0025c; +0.0025c]	_μ3†: Beta law withα=2,β=2
Bump 4:_Ξ4	[−0.0025c; +0.0025c]	_μ4†: Beta law withα=2,β=2

Author Contributions

Conceptualization, L.B. and T.J.S.; methodology, L.B. and T.J.S.; software, L.B.; validation, J.-L.A., É.S., and T.J.S.; formal analysis, L.B., J.-L.A., É.S., and T.J.S.; investigation, L.B.; resources, L.B., J.-L.A., É.S., and T.J.S.; data curation, L.B.; writing-original draft preparation, L.B.; writing-review and editing, L.B., J.-L.A., É.S., and T.J.S.; visualization, L.B.; supervision, É.S. and T.J.S.; project administration, T.J.S.; funding acquisition, L.B., É.S., and T.J.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work of J.-L.A. and É.S. has been partially supported by ONERA within the Laboratoire de Mathématiques Appliquées pour l'Aéronautique et Spatial (LMA²S). L.B. is supported by a CDSN grant from the French Ministry of Higher Education (MESRI) and a grant from the German Academic Exchange Service (DAAD), Program #57442045. T.J.S. has been partially supported by the Freie Universität Berlin within the Excellence Strategy of the DFG, including project TrU-2 of the Excellence Cluster "MATH+ The Berlin Mathematics Research Center" (EXC-2046/1, project 390685689) and DFG project 415980428.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CFD	Computational Fluid Dynamics
DOAJ	Directory of open access journals
MDPI	Multidisciplinary Digital Publishing Institute
OUQ	Optimal Uncertainty Quantification
PAVA	Pool-Adjacent-Violators Algorithm

Word count: 5779

Show less

© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Motivated by the desire to numerically calculate rigorous upper and lower bounds on deviation probabilities over large classes of probability distributions, we present an adaptive algorithm for the reconstruction of increasing real-valued functions. While this problem is similar to the classical statistical problem of isotonic regression, the optimisation setting alters several characteristics of the problem and opens natural algorithmic possibilities. We present our algorithm, establish sufficient conditions for convergence of the reconstruction to the ground truth, and apply the method to synthetic test cases and a real-world example of uncertainty quantification for aerodynamic design.

Details

Title

Adaptive Reconstruction of Imperfectly Observed Monotone Functions, with Applications to Uncertainty Quantification

Author

Bonnet, Luc

; Akian, Jean-Luc; Savin, Éric

; Sullivan, T J

First page

196

Publication year

2020

Publication date

2020

Publisher

MDPI AG

e-ISSN

19994893

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/a13080196

ProQuest document ID

2434992116

Adaptive Reconstruction of Imperfectly Observed Monotone Functions, with Applications to Uncertainty Quantification

Jump to:

Full Text

Abstract

Details

Suggested sources