1. Introduction
This paper considers the problem of adaptively reconstructing a monotonically increasing functionF†from imperfect pointwise observations of this function. In the statistical literature, the problem of estimating a monotone function is commonly known as isotonic regression, and it assumed that the observed data consist of noisy pointwise evaluations ofF†. However, we consider this problem under assumptions that differ from the standard formulation, and these differences motivate our algorithmic approach to the problem. To be concrete, our two motivating examples are that
F†(x):=PΞ∼μ[g(Ξ)≤x]
is the cumulative distribution function (CDF) of a known real-valued function g of a random variableΞwith known distributionμ, or that
F†(x):=sup(g,μ)∈APΞ∼μ[g(Ξ)≤x]
is the supremum of a family of such CDFs over some classA. We assume that we have access to a numerical optimisation routine that can, for each x and some given numerical parameters q (e.g., the number of iterations or other convergence tolerance parameters), produce a numerical estimate or observationG(x,q)ofF†(x); furthermore, we assume thatG(x,q)≤F†(x)is always true, i.e., the numerical optimisation routine always under-estimates the true optimum value, and that the positive errorF†(x)−G(x,q)can be controlled to some extent through the choice of the optimisation parameters q, but remains essentially influenced by randomness in the optimisation algorithm for each x. The assumptionG(x,q)≤F†(x) is for example coherent with either Equation (1), which may be approached by increasing the number of samples (say q) in a Monte Carlo simulation, or Equation (2), which is a supremum over a set that may be explored only partially by the algorithm.
A single observationG(x,q)yields some limited information aboutF†(x); a key limitation is that one may not even know a priori how accurateG(x,q)is. Naturally, one may repeatedly evaluate G at x, perhaps with different values of the optimisation parameters q, in order to more accurately estimateF†(x). However, a key observation is that a suite of observationsG(xi,qi),i=1,⋯,I, contains much more information than simply estimates ofF†(xi),i=1,⋯,I, and this information can and must be used. For example, suppose that the values(G(xi,qi))i=1Iare not increasing, e.g., because
G(xi,qi)>G(xi′ ,qi′ )andxi<xi′ .
Such a suite of observations would be inconsistent with the axiomatic requirement thatF†is an increasing function. In particular, while the observation atximay be relatively good or bad on its own merits, the observationG(xi′ ,qi′ )atxi′ , which violates monotonicity, is in some sense “useless” as it gives no better lower bound onF†(xi′ )than the observation atxidoes. The observation atxi′ is thus a good candidate for repetition with more stringent optimisation parameters q—and this is not something that could have been known without comparing it to the rest of the data set.
The purpose of this article is to leverage this and similar observations to define an algorithm for the reconstruction of the functionF†, repeating old observations of insufficient quality and introducing new ones as necessary. The principal parameter in the algorithm is an “exchange rate”E that quantifies the degree to which the algorithm prefers to have a few high-quality evaluations versus many poor-quality evaluations. Our approach is slightly different from classical isotonic (or monotonic) regression, which is understood as the least-squares fitting of an increasing function to a set of points in the plane. The latter problem is uniquely solvable and its solution can be constructed by the pool adjacent violators algorithm (PAVA) extensively studied in Barlow et al. [1]. This algorithm consists of exploring the data set from left to right until the monotonicity condition is violated, and replacing the corresponding observations by their average while back-averaging to the left if needed to maintain monotonicity. Extensions to the PAVA have been developed by de Leeuw et al. [2] to consider non least-squares loss functions and repeated observations, by Tibshirani et al. [3] to consider “nearly isotonic” or “nearly convex” fits, and by Jordan et al. [4] to consider general loss functions and partially ordered data sets. Useful references on isotonic regression also include Robertson et al. [5] and Groeneboom and Jongbloed [6].
The remainder of this paper is structured as follows. Section 2 presents the problem description and notation, after which the proposed adaptive algorithm for the reconstruction ofF† is presented in Section 3. We demonstrate the convergence properties of the algorithm in Section 3.2 and study its performance on several analytically tractable test cases in Section 4. Section 5 details the application of the algorithm to a challenging problem of the form Equation (2) drawn from aerodynamic design. Some closing remarks are given in Section 6.
2. Notation and Problem Description
In the following, the “ground truth” response function that we wish to reconstruct is denotedF†:[a,b]→Rand has inputsx∈[a,b]⊂R. It is assumed thatF†is monotonically increasing and non-constant on[a,b]. In contrast,G:[a,b]×R+→Rdenotes the numerical process used to obtain an imperfect pointwise observation y ofF†(x)at some pointx∈[a,b]for some numerical parameterq∈R+. Here, on a heuristic level,q>0stands for the “quality” of the noisy evaluationG(x,q).
The main aim of this paper is to show the effectiveness of the proposed algorithm for the adaptive reconstruction ofF†, which could be continuous or not, from imperfect pointwise observationsG(xi,qi)ofF†, where we are free to choosexi+1andqi+1adaptively-based onxj,qj, andG(xj,qj)forj≤i
First, we associate with I imperfect pointwise observations{xi,yi:=G(xi,qi)}i=1I⊂[a,b]×R, positive numbers{qi}i=1I⊂R+which we will call qualities. The qualityqiquantifies the confidence we have in the pointwise observationyiofF†(xi)using the numerical processG(xi,qi). The higher this value, the greater the confidence. We divide this quality as the product of two different numbersciandri,qi=ci×ri, with the following definitions:
-
Consistencyci∈{0,1}: This describes the fact that two successive points must be monotonically consistent with respect to each other. That is, when one takes two input valuesx2>x1, one should havey2≥y1as y must be monotonically increasing. There is no consistency associated with the very first data point as it does not have any predecessor.
-
Reliabilityri∈R+: This describes how confident we are about the numerical value. Typically, it will be related to some error estimator if one is available, or the choice of optimisation parameters. It is expected that the higher the reliability, the closer the pointwise observation is to the true value, on average.
Typically, if the observationyi+1=G(xi+1,qi+1)is consistent with regard to the observationyi=G(xi,qi)wherexi+1>xi, the qualityqi+1associated withyi+1will be equal toqi+1=ri+1∈R+*sinceci+1=1in this case. If the value is not consistent, we haveqi+1=ri+1×ci+1=0. Finally, if x=athere is no notion of consistency as there is no point preceding it. Thereby, the quality associated with this point is only equal to its reliability.
Moreover, we associate to these pointwise observations a notion of area, illustrated in Figure 1 and defined as follows. Consider two consecutive pointsxiandxi+1with their respective observationsyiandyi+1, the areaaifor these two points is
ai=(xi+1−xi)×(yi+1−yi).
Thus, we can define a vectora={ai}i=1I−1which contains all the computed areas for the whole dataset. In addition, we can assure that if we take two pointsx1andx2>x1withy1=F†(x1)andy2=F†(x2)—namely the error at these point is equal to zero, the graph of ground truth functionF†must lie in the rectangular area spanned by the two points(x1,F†(x1))and(x2,F†(x2)).
To adopt a conservative point of view, we choose as the approximating function F ofF†a piecewise constant interpolation function, say:
F(x)=∑i=1I−1yi 1[xi,xi+1)(x),
where1Idenotes the indicator function of the intervalI. We do not want this interpolation function to overestimate the true functionF†as one knows that the numerical estimate in our case always underestimates the ground truth functionF†(x) . See Figure 1 for an illustration of this choice, which can be viewed as a worst-case approach. Indeed, this chosen interpolation function is the worst possible function underestimatingF†given two pointsx1andx2and their respective observationsy1andy2.
3. Reconstruction Algorithms
The reconstruction algorithm that we propose, Algorithm 1, is driven to produce a sequences of reconstructions that converges toF†by following a principle of area minimisation: we associate to the discrete data set{xi,yi}i=1I⊂[a,b]×R a natural notion of area (3) as explained above, and seek to drive this area towards zero. The motivation behind this objective is in Proposition 2 which states that the area converges to 0 as more points are added to the data set. However, the objective of minimising the area is complicated by the fact that evaluations ofF†are imperfect. Therefore, a key user-defined parameter in the algorithm isE∈(0,∞), which can be thought of as an “exchange rate” that quantifies to what extent the algorithm prefers to redo poor-quality evaluations of the target function versus driving the area measure to zero.
3.1. Algorithm
The main algorithm is organized as follows, starting fromI(0)≥2points and a dataset that is assumed to be consistent at the initial stepn=0. It goes through N iterations, where N is either fixed a priori, or obtained a posteriori once a stopping criterion is met. Note thatqnewstands for the quality of a newly generated observationynewfor any new pointxnewintroduced by the algorithm. The latter is driven by the user-defined “exchange rate”Eas explained just above. At each step n, the algorithm computes the weighted areaWA(n)as the minimum of the quality times the sum of the areas of the data points:
WA(n)=q−(n)×A(n),
where
q−(n)=min1≤i≤I(n){qi(n)},A(n)=∑i=1I(n)−1ai(n),
ai(n) is the area computed by Equation (3) at step n (see also Equation (9)), andI(n)is the number of data points. Then it is divided into two parts according to the value ofWA(n)compared to E.
Algorithm 1: Adaptive algorithm to reconstruct a monotonically increasing functionF† |
Input:I(0)≥2,{xi(0),yi(0),qi(0)}i=1I(0) andE. |
Output:{xi(N),yi(N),qi(N)}i=1I(N) withI(N)≥I(0). |
Initialization: |
Get the worst quality point and its index:
|
Compute the area of each pair of data points:ai(0)=(xi+1(0)−xi(0))×(yi+1(0)−yi(0)). |
Get the biggest rectangle and its index:
|
Define the weighted area at stepn=0asWA(0)=q−(0)×∑i=1I(0)−1ai(0) . |
-
IfWA(n)<E, then the algorithm aims at increasing the qualityq−(n)of the worst data point (the one with the lowest quality) with indexi−(n)=argmin1≤i≤I(n){qi(n)}at step n. It stores the corresponding old valueyold, searches for a new valueynewby improving successively the quality of this very point, and stops whenynew>yold.
-
IfWA(n)≥E, then the algorithm aims at driving the total areaA(n)to zero. In that respect, it identifies the biggest rectangle
a+(n)=max1≤i≤I(n)−1{ai(n)}
and its index
i+(n)=argmax1≤i≤I(n)−1{ai(n)}
and adds a new pointxnewat the middle of this biggest rectangle. Then, it computes a new data valueynew=G(xnew,qnew)with a new qualityqnew.
In both cases, the numerical parametersqnew(for example several iterations, or the size of a sampling set or a population) are arbitrary and any value can be chosen in practice each time a new pointxnewis added to the dataset. They can be increased arbitrarily as well each time such a new point has to be improved. Indeed, the numerical parameters q of the optimisation routine we have access to can be increased as much as desired, and increasing them will improve the estimatesG(x,q)of the true functionF†(x)uniformly in x; see Assumption 1. The algorithm then verifies the consistency of the dataset by checking the quality of each point. If there is any inconsistent point, the algorithm computes a new value until obtaining consistency by improving successively the corresponding reliability. This is achieved in a finite number of steps starting from an inconsistent point and exploring the dataset from the left to the right.
Finally, the algorithm updates the quality vector{qi(n+1)}i=1I(n+1) , the area vector{ai(n+1)}i=1I(n+1) , the worst qualityq−(n+1)and the indexi−(n+1)of the corresponding point, the biggest rectanglea+(n+1)and its indexi+(n+1), and then the new weighted areaWA(n+1).
3.2. Proof of Convergence
We denote byI(n)the number of data points, and{xi(n),yi(n),qi(n)}i=1I(n) the positions of the data points, the observations given by the optimization algorithm at these positions, and the qualities associated with the optimization algorithm at the step n of Algorithm 1. For eachi=1,⋯,I(n)−1, we definesi(n)=[xi(n),xi+1(n)[⊂[a,b]and the vector containing all rectangle areas{ai(n)}i=1I(n)−1by:
ai(n)=(xi+1(n)−xi(n))×(yi+1(n)−yi(n)).
The pointwise observationyi(n)=G(xi(n),qi(n))is thus associated with the qualityqi(n)∈R+ , which quantifies the confidence we have in this observation as outlined in the problem description in Section 2. This number can represent the inverse error achieved by the optimization algorithm, for example, or the number of iterations, or the number of individuals in a population, or any other numerical parameter pertaining to this optimization process. The higher it is, the closer the observation is to the true target value. Therefore we consider the following assumption on the numerical process G.
Assumption 1.
G(x,q)converges toF†(x)asq→+∞uniformly inx∈[a,b], that is:
∀ϵ>0,∃Q>0suchthat∀q≥Q,∀x∈[a,b],G(x,q)−F†(x)≤ϵ.
Moreover, we can guarantee that:
∀x∈[a,b],∀q∈R+,G(x,q)≤F†(x).
That is, the optimisation algorithm will always underestimate the true valueF†(x). In this way, one can model the relationship between the numerical estimate G and the true valueF†as:
∀x∈[a,b],∀q∈R+,G(x,q)=F†(x)−ϵ(x,q),
whereϵis a positive random variable. These assumptions imply some robustness and stability of the algorithm we use.
In the following, we will assume thatI(0)≥2. That is, we have at least two data points at the beginning of the reconstruction algorithm. Also among these points, we have one point atx=aand another one atx=b. Moreover, we will assume that the initial dataset is consistent. Since Algorithm 1 recomputes the inconsistent points at all steps, we can also consider in the following that any new numerical observation is actually consistent. Also, we need to guarantee that the weighted areaWA(n)will permanently oscillate aboutEas the iteration step n is increasing; this is the purpose of Assumption 3 below as shown in the subsequent Proposition 1. From these properties it will then be shown that Algorithm 1 is convergent, as stated in Theorem 1.
Assumption 2.Any new numerical value obtained by Algorithm 1 is consistent.
Assumption 3.
q−(n)→+∞asn→∞.
Within Assumption 2 all points have a consistency of 1, and thereforeq=r>0the reliability. Besides, one hasG(xi(n),qi(n))≤G(xi+1(n),qi+1(n)), i.e.,yi(n)≤yi+1(n)for all points i and steps n. We finally define the sequence of piecewise constant reconstruction functionsF(n)as follows.
Definition 1.
For eachx∈[a,b], we define the reconstructing functionF(n)at step n as:
F(n)(x)=∑i=1I(n)−1yi(n) 1si(n) (x),
andF(n)(xI(n)(n))=F(n)(b)=yI(n)(n).
Now let
E+:={n∈N;WA(n)≥E},E−:={n∈N;WA(n)<E},
which are such thatE+∪E−=NandE+∩E−=∅. In order to prove the convergence (in a sense to be given) of Algorithm 1, we first need to establish the following intermediate results, Proposition 1, Proposition 2, and Proposition 3. They clarify the behaviour of the sequenceWA(n)when points are added to the dataset and the largest areaa+(n) is divided into four parts at each iteration step n; see Figure 2.
Proposition 1.
E+is infinite.
Proof.
Let us assume thatE+is finite:∃Nsuch that∀n≥N,n∈E−. Therefore we are in the situationWA(n)<E, the minimum qualityq−(n)of the data goes to infinity, and the total areaA(n)is modified although the evaluation points{xi(n)}i=1I(n) and their numberI(n)are unchanged; thus they are independent of n. Repeating this step yields
limn→∞A(n)=∑i=1I−1(xi+1−xi)(F†(xi+1)−F†(xi))=A>0
sinceF†is monotonically increasing and non-constant on[a,b], and Assumption 1 is used. ConsequentlyWA(n)→+∞asn→∞, that isWA(n)≥E∀n≥N1for someN1, which is a contradiction. □
The setE+is therefore of the form
E+=⋃k≥1⟦mk,nk⟧,
where
⟦mk,nk⟧:={n∈N;mk≤n≤nk}.
Let us introduce the strictly increasing applicationφ:N→Nsuch thatφ(p)is the pth element ofE+(in increasing order), and⟦mk,nk⟧=φ(⟦pk+1,pk+1⟧). p is the counter of the elements ofE+, and n is the corresponding iteration number.
Proposition 2.
LetI(φ(p))=I(φ(0))+p. Then
A(φ(p))=∑i=1I(φ(p))−1ai(φ(p))=O1p
asp→∞, andA(n)→0asn→0.
Proof.
Letk≥1andn=φ(p)∈⟦mk,nk⟧, wherep∈⟦pk+1,pk+1⟧. LetA(n) be given by Equation (6),a+(n) be given y Equation (7), andi+(n) be given by Equation (8). At iterationn+1one has:
xi(n+1)=xi(n)for1≤i≤i+(n),12xi+(n)(n)+xi+(n)+1(n)fori=i+(n)+1,xi−1(n)fori+(n)+2≤i≤I(n+1).
Alsoyi(n+1)≤yi+1(n+1)for1≤i≤I(n+1)−1. One may check thata+(n)=2ai+(n)(n+1)+2ai+(n)+1(n+1) (see Figure 2) and therefore:
A(n+1)=A(n)−a+(n)+ai+(n)(n+1)+ai+(n)+1(n+1)=A(n)−12a+(n).
BesidesA(n)≤(I(n)−1)a+(n)so that one has:
A(n+1)≤A(n)−A(n)2(I(n)−1)≤A(n)2(I(n)−1)−12(I(n)−1),
or:
A(φ(p)+1)≤A(φ(p))2(I(φ(p))−1)−12(I(φ(p))−1).
At this stage two situations arise:
-
eitherp∈⟦pk+1,pk+1−1⟧, in which caseφ(p)+1=φ(p+1);
-
orp=pk+1, in which case by our algorithmA(n)is kept constant fromn=nk+1ton=mk+1; that isA(nk+1)=A(mk+1), or:
A(φ(pk+1)+1)=A(φ(pk+1+1)).
The choice of k being arbitrary, one concludes that Equation (14) also reads∀p∈N:
A(φ(p+1))≤A(φ(p))2(I(φ(p))−1)−12(I(φ(p))−1)≤A(φ(p))2(I(φ(0))+p−1)−12(I(φ(0))+p−1).
Thus:
A(φ(p))≤A(φ(1))∏i=1p−12(I(φ(0))+i−1)−12(I(φ(0))+i−1)≤A(φ(1))∏i=1p−11+αi1+βi,
lettingα=I(φ(0))−32andβ=I(φ(0))−1.
However,
∑i=1plog1+αi=α∑i=1p1i+Cp″
wherelimp→∞ Cp″=C″, and
∑i=1p1i=logp+γ+ϵp′,
whereγis the Euler constant andlimp→∞ ϵp′=0. Consequently:
∑i=1p−1log1+αi−∑i=1p−1log1+βi=(α−β)log(p−1)+Cp′=(α−β)logp+log1−1p+Cp′=log1p+Cp,
sinceα−β=−12; againCpandCp′are sequences with constant limitslimp→∞ Cp=Candlimp→∞ Cp′=C′. Therefore,
∏i=1p−11+αi1+βi=Cp(1+ϵp)
whereCis a constant, andlimp→∞ ϵp=0. One also concludes thatA(n), which is either kept constant or equal toA(φ(p)), converges to 0 asn→∞. Hence the claimed results hold. □
Proposition 3.
E−is infinite.
Proof.
Let us assume thatE−is finite:∃Nsuch that∀n≥N,n∈E+. Therefore we are in the situationWA(n)≥E>0, andφ(n)has the formφ(n)=n−n0,n≥Nfor somen0∈N. From Proposition 2:
A(n−n0)=O1n,
thusA(n)→0andWA(n)→0asn→∞sinceq−(n)is kept unchanged, which is a contradiction. □
We now provide three results on the convergence of Algorithm 1. As is to be expected, the algorithm can only be shown to converge uniformly when the target response functionF†is sufficiently smooth; otherwise, the convergence is at best pointwise or in mean.
Theorem 1
(Algorithm convergence). Assume thatF†is strictly increasing. Then, for any choice ofE>0, Algorithm 1 is convergent in the following senses:
-
IfF†is piecewise continuous on[a,b], thenlimn→∞ F(n)(x)=F†(x)at all pointsx∈[a,b]whereF†is continuous;
-
IfF†is continuous on[a,b], then convergence holds uniformly:∥F(n)−F† ∥∞→n→∞0.
Proof.
LetE>0. We know from Propositions 1 and 3 thatWA(n)will oscillate aboutEin the iterating process asn→∞, whilelimn→∞ q−(n)=+∞from Assumption 3. Furthermore, let
Δ(n):=sup1≤i≤I(n)−1xi+1(n)−xi(n).
Assuming for example that for some j,sj(n)=[xj(n),xj+1(n))is never divided in two in the iteration process and is thus independent of n, it turns out thataj(n)→(xj+1−xj)(F†(xj+1)−F†(xj))>0asn→∞, which is impossible becauseA(n)goes to 0 asn→∞from Proposition 2. Therefore there exists somem∈N*(depending on n) such thatΔ(n+m)≤12Δ(n); also the sequenceΔ(n)is decreasing, henceΔ(n)→0asn→∞.
Now letx∈[xi(n),xi+1(n)). Then:
F(n)(x)−F†(x)=G(xi(n),qi(n))−F†(x)≤G(xi(n),qi(n))−F†(xi(n))+F†(xi(n))−F†(x).
However,xi(n)→xasn→∞becauseΔ(n)→0; thus ifF†is continuous at x, the second term on the right hand side above goes to 0 asn→∞. However, ifF†is continuous everywhere on[a,b], it is in addition uniformly continuous on[a,b]by Heine’s theorem, and the second term goes to 0 asn→∞uniformly on[a,b]. Finally, invoking Assumption 1, the first term on the right hand side above also tends to 0 asn→∞. This completes the proof. □
Proposition 4
(Convergence in mean). LetF†:[a,b]→Rbe piecewise continuous. Then Algorithm 1 is convergent in mean in the sense that
∥F(n)−F† ∥1→n→∞0.
Proof.
We can check that the sequenceF(n)is monotone. Indeed, ifWA(n)<E, then by construction we have
F(n+1)(x)−F(n)(x)≥yi−(n)(n+1)−yi−(n)(n)1s−(n) (x)≥0
wheres−(n)=[xi−(n)(n),xi−(n)+1(n)). However, ifWA(n)>E, then consistency implies that
F(n+1)(x)−F(n)(x)≥yi+(n)+1(n+1)−yi+(n)(n)1s+(n+1) (x)≥0
wheres+(n+1)=[xi+(n)+1(n+1),xi+(n)+2(n+1)). The claim now follows from the monotone convergence theorem and the fact thatF(0)is integrable. □
4. Test Cases
To show the effectiveness of Algorithm 1, we try it on two cases, in whichF†is a continuous function and a discontinuous function respectively. For both cases, the error between the numerical estimate and the ground truth function is modelled as a random variable following a Log-normal distribution. That is,
∀x∈[a,b],ϵ(x)∼LogN(μ(x),σ2),
withσ2=1andμ(x)is chosen asP[0≤ϵ(x)≤0.1·F†(x)]=0.9. Thus, the meanμis different for eachx∈[a,b].
As we have access to the ground truth function and for validation purpose, the quality value associated woth a numerical point is the inverse of the relative error. Moreover, we assume that the initial points are consistent.
For illustrative purposes, we set the parameterE=15for the examples considered below.
4.1.F†Is a Continuous Function
First, consider the functionF†∈C0([1,2],[1,2])defined as follows:
F†(x)=F1†(x)if x∈[1,32],F2†(x)if x∈[32,2],
with
F1†(x)=a1exp(x3)+b1,F2†(x)=a2exp((3−x)3)+b2,
where:
a1=−12(exp(1)−exp(27/8)),b1=3−2exp(19/8)2(1−exp(19/8)),a2=−a1,b2=2a1exp(27/8)+b1.
The target functionF†and the reconstructionsF(n) obtained through the algorithm for several values of the step n are shown in Figure 3. For each n, the reconstructionF(n)is increasing and the initial points are consistent. The ∞-norm and 1-norm of the error appear to converge to zero with approximate rates−0.512and−0.534respectively.
4.2.F†Is a Discontinuous Function
Now, consider the discontinuous functionF†defined as follows:
F†(x)=F1†if x∈[1,32],F2†if x∈(32,2],
whereF1†andF2† are given by (16), and:
a1=−12(exp(1)−exp(27/8)),b1=3−2exp(19/8)2(1−exp(19/8)),a2=25(exp(8)−exp(27/8)),b2=10−8exp(37/8)5(1−exp(37/8)).
Here,F†is piecewise continuous on[1,32]and]32,2]. In this case, one can apply Proposition 4. The target functionF†and the reconstructionsF(n) obtained through the algorithm for several values of the step n are shown in Figure 4. Observe that the approximation quality, as measured by the ∞-norm of the errorF†−F(n), quite rapidly saturates and does not converge to zero. This is to be expected for this discontinuous targetF†, since closeness of two functions in the supremum norm mandates that they have approximately the same discontinuities in exactly the same places. The 1-norm error, in contrast, appears to converge at the rate−0.561.
Regarding computational cost, the number of calls to the numerical model is lower whenF†is continuous than when it is discontinuous. For both examples above and for the same number of data points, the number of evaluations of the numerical model (analytical formula in the present case) in the discontinuous case is about six times higher than the number of evaluations in the continuous case. This is because the algorithm typically adds more points near discontinuities and the effort of making them consistent increases the number of calls to the model.
4.3. Influence of the User-Defined ParameterE
We consider the case in whichF† is discontinuous, as in Section 4.2. We will show the influence of the choice of the parameterEon the reconstruction functionF(n).
4.3.1. CaseE≪1
Let us consider the caseE=10−4≪1 . This choice corresponds to the case where one wishes to split over redo the worst quality point. This can be seen on Figure 5 where the worst quality is almost constant over 100 steps while the sum of areas strongly decreases; see Figure 5e and Figure 5f respectively. At each step, the algorithm is adding a new point by splitting the biggest rectangle. One can note on Figure 5f that the minimum of the quality is not constant. It means that when the algorithm added a new data point, the point with the worst quality was not consistent any more and had to be recomputed. In summary, in this case, we obtain more points but with lower quality values.
4.3.2. CaseE≫1
We now consider the caseE=104≫1 . This choice corresponds to the case where one wishes to redo the worst quality point over split. This can be seen on Figure 6 where the sum of areas stays more or less the same over 100 steps while the minimum of the quality surges; see Figure 6f and Figure 6e respectively. There is no new point. The algorithm is only redoing the worst quality point to improve it. To sum up, we obtain fewer points with higher quality values.
5. Application to Optimal Uncertainty Quantification 5.1. Optimal Uncertainty Quantification
In the optimal uncertainty quantification paradigm proposed by Owhadi et al. [7] and further developed by, e.g., Sullivan et al. [8] and Han et al. [9], upper and lower bounds on the performance of an incompletely specified system are calculated via optimisation problems. More concretely, one is interested in the probability that a system, whose output is a functiong†:X→Rof inputsΞdistributed according to a probability measureμ†on an input spaceX, satisfiesg†(Ξ)≤x, where x is a specified performance threshold value. We emphasise that although we focus on a scalar performance measure, the inputΞmay be a multivariate random variable.
In practice,μ†andg†are not known exactly; rather, it is known only that(μ†,g†)∈Afor some admissible subsetAof the product space of all probability measures onXwith the set of all real-valued functions onX. Thus, one is interested in
P̲A:=(x)inf(μ,g)∈APΞ∼μ[g(Ξ)≤x]andP¯A(x):=sup(μ,g)∈APΞ∼μ[g(Ξ)≤x].
The inequality
0≤P̲A(x)≤PΞ∼μ†[g†(Ξ)≤x]≤P¯A(x)≤1
is, by definition, the tightest possible bound on the quantity of interestPΞ∼μ†[g†(Ξ)≤x]that is compatible with the information used to specifyA. Thus, the optimal UQ perspective enriches the principles of worst- and best-case design to account for distributional and functional uncertainty. We concentrate our attention hereafter, without loss of generality, on the least upper boundP¯A(x).
Remark 1.
The main focus of this paper is the dependency ofP¯A(x)on x. In practice, an underlying task is, for any individual x, reducing the calculation ofP¯A(x) to a tractable finite-dimensional optimisation problem. Central enabling results here are the reduction theorems of (Owhadi et al. [7], Section 4), which loosely speaking, say that if, for each g,{μ∣(μ,g)∈A}is specified by a system of m equality or inequality constraints on expected values of arbitrary test functions under μ, then for the determination ofP¯A(x)it is sufficient to consider only distributions μ that are convex combinations of at mostm+1point masses; the optimisation variables are then the m independent weights andm+1locations inX of these point masses. If μ factors as a product of distributions (i.e., Ξ is a vector with independent components), then this reduction theorem applies componentwise.
As a function of the performance threshold x,P¯A(x)is an increasing function, and so it is potentially advantageous to determineP¯A(x)jointly for a wide range of x values using the algorithm developed above. Indeed, determiningP¯A(x)for many values of x, rather than just one value, is desirable for multiple reasons:
-
Since numerical optimisation to determineP¯A(x)may be affected by errors, computing several values ofP¯A(x)could lead to validate their consistency as the functionx↦P¯A(x)must be increasing;
-
The functionP¯A(x)can be discontinuous. Thus, by computing several values ofP¯A(x), one can highlight potential discontinuities and can identify key threshold values ofx↦P¯A(x).
5.2. Test Case
For the application of Algorithm 1 to OUQ, we study the robust shape optimization of the two-dimensional RAE2822 airfoil [10] (Appendix A6) using ONERA’s CFD software elsA [11]. The following example is taken from Dumont et al. [12]. The shape of the original RAE2822 is altered using four bumps located at four different locations:5%,20%,40%, and60% of the way along the chord c (see Figure 7). These bumps are characterised by B-splines functions.
The lift-to-drag ratioCl Cd of the RAE2822 wing profile (see Figure 8) at Reynolds NumberRe=6.5×106, Mach numberM∞=0.729and angle of attackα=2.31°is chosen as the performance functiong†with inputsΞ=(Ξ1,Ξ2,Ξ3,Ξ4), where(Ξi)i=1⋯4 is the amplitude of each bump. They will be considered as random variables over their respective range given in Table 1.
The corresponding flow values are the ones described in test case#6 together with the wall interferences corrections formulas given in [13] (Chapter 6) and in [14] (Section 5.1). Moreover, we will assume that(Ξi)i=1⋯4are mutually independent. An ordinary Kriging procedure has been chosen to build a metamodel (or response surface) ofg†, which is identified with the actual response functiong†in the subsequent analysis. A tensorised grid of 9 equidistributed abscissas for each parameter is used. The model is then based onN=94=6561observations. In that respect, a Gaussian kernel
K(Ξ,Ξ′)=exp−12∑i=14(Ξi−Ξi′)2 γi2
has been chosen, whereΞ=(Ξ1,Ξ2,Ξ3,Ξ4)andΞ′=(Ξ1′,Ξ2′,Ξ3′,Ξ4′)are inputs of the functiong†, and whereγ=(γ1,γ2,γ3,γ4)are the parameters of the kernel. These parameters are chosen to minimize the variance between the ground truth data defined by the N observations and their Kriging metamodelg†. The responce surfaces in the(Ξ1,Ξ3)plan for two values of(Ξ2,Ξ4) are shown in Figure 9.
One seeks to determineP¯A(x):=supμ∈A PΞ∼μ[g†(Ξ)≤x], where the admissible setAis defined as follows:
A=(g,μ)|Ξ∈X=X1×X2×X3×X4g:X↦Yisknownequaltog†μ=μ1⊗μ2⊗μ3⊗μ4EΞ∼μ[g(Ξ)]=LD.
A priori, findingP¯A(x)is not computationally tractable because it requires a search over a infinite-dimensional space of probability measures defined byA . Nevertheless, as described briefly in Remark 1, it has been shown in Owhadi et al. [7] that this optimisation problem can be reduced to a finite-dimensional one, where now the probability measures are products of finite convex combinations of Dirac masses.
Remark 2.
The ground truth lawμ† of each input variable given in Table 1 is only used to compute the expected valueEΞ∼μ[g(Ξ)]=LD. This expected value is computed with104samples.
Remark 3.
The admissible setA from (17) can be understood as follows:
-
One knows the range of each input parameter(Ξi)i=1,⋯,4;
-
g is exactly known asg=g†;
-
(Ξi)i=1,⋯,4are independent;
-
One only knows the expected value of g:EΞ∼μ[g(Ξ)].
The optimisation problem of determiningP¯A(x) for each chosen x was solved using the Differential Evolution algorithm of Storn and Price [15] within the mystic optimisation framework [16]. Ten iterations of Algorithm 1 have been performed usingE=1×104. The evolution ofP¯A(x) as function of the iteration count, n, is shown in Figure 10. Atn=0 —see Figure 10a—two consistent points are present atx=57.51andx=67.51. At this step,WA(0)=35289. AsWA(0)≥E, at next stepn=1 , the algorithm adds a new point at the middle of the biggest rectangle—see Figure 10b and Figure 11b. Aftern=10steps, eight points are now present in total with a minimum quality increasing from 5000 to 11,667 and with a total area decreasing from7.05to0.84 ; see Figure 11a and Figure 11b respectively.
The number of iterations in this complex numerical experiment has been limited to 10 because obtaining new or improved data points consistent throughout the optimization algorithm may take up to two days (wall-clock time on a personal computer equipped with an Intel Core i5-6300HQ processor with 4 cores and 6 MB cache memory) for one single point. This running time is increased further for data points of higher quality. Nevertheless, this experiment shows that the proposed algorithm can be used for real-world examples in an industrial context. 6. Concluding Remarks
In this paper we have developed an algorithm to reconstruct a monotonically increasing function such as the cumulative distribution function of a real-valued random variable, or the least upper bound of the performance criterion of a system as a function of its performance threshold. In particular, this latter setting has relevance to the optimal uncertainty quantification (OUQ) framework of [7] we have in mind for applications to real-world incompletely specified systems. The algorithm uses imperfect pointwise evaluations of the target function, subject to partially controllable one-sided errors, to direct further evaluations either at new sites in the function’s domain or to improve the quality of evaluations at already-evaluated sites. It allows for some flexibility at targeting either strategy through a user-defined “exchange rate” parameter, yielding an approximation of the target function with a few high-quality points or alternatively more lower-quality points. We have studied its convergence properties and have applied it to several examples: known target functions that are either continuous and discontinuous, and a performance function for aerodynamic design of a well-documented standard profile in the OUQ setting.
Algorithm 1 is reminiscent of the classical PAVA approach to isotonic regression that applies to statistical inference with order restrictions. Examples of its use can be found in shape constrained or parametric density problems as illustrated in e.g., [6]. Possible improvements and extensions of our algorithm include weighting the areasai(n)as they are summed up to form the total weighted areaWA(n)driving the iterative process, in order to optimally enforce both the addition of “steps”si(n)in the reconstruction functionF(n)of Definition 1, and the improvement of their “heights”yi(n). This could be achieved considering for example the following alternative definitioni+(n)=argmaxi{(I(n)−i−1)ai(n)}in Algorithm 1, which results in both adding a step to thei+(n)-th current one and possibly improving all subsequent evaluationsyi(n+1),i>i+(n). We may further envisage to adapt the ideas elaborated in this research to the reconstruction of convex functions by extending the notion of consistency. These perspectives shall be considered in future works.
Range | Law | |
---|---|---|
Bump 1:Ξ1 | [−0.0025c; +0.0025c] | μ1†: Beta law withα=6,β=6 |
Bump 2:Ξ2 | [−0.0025c; +0.0025c] | μ2†: Beta law withα=2,β=2 |
Bump 3:Ξ3 | [−0.0025c; +0.0025c] | μ3†: Beta law withα=2,β=2 |
Bump 4:Ξ4 | [−0.0025c; +0.0025c] | μ4†: Beta law withα=2,β=2 |
Author Contributions
Conceptualization, L.B. and T.J.S.; methodology, L.B. and T.J.S.; software, L.B.; validation, J.-L.A., É.S., and T.J.S.; formal analysis, L.B., J.-L.A., É.S., and T.J.S.; investigation, L.B.; resources, L.B., J.-L.A., É.S., and T.J.S.; data curation, L.B.; writing-original draft preparation, L.B.; writing-review and editing, L.B., J.-L.A., É.S., and T.J.S.; visualization, L.B.; supervision, É.S. and T.J.S.; project administration, T.J.S.; funding acquisition, L.B., É.S., and T.J.S. All authors have read and agreed to the published version of the manuscript.
Funding
The work of J.-L.A. and É.S. has been partially supported by ONERA within the Laboratoire de Mathématiques Appliquées pour l'Aéronautique et Spatial (LMA2S). L.B. is supported by a CDSN grant from the French Ministry of Higher Education (MESRI) and a grant from the German Academic Exchange Service (DAAD), Program #57442045. T.J.S. has been partially supported by the Freie Universität Berlin within the Excellence Strategy of the DFG, including project TrU-2 of the Excellence Cluster "MATH+ The Berlin Mathematics Research Center" (EXC-2046/1, project 390685689) and DFG project 415980428.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
CFD | Computational Fluid Dynamics |
DOAJ | Directory of open access journals |
MDPI | Multidisciplinary Digital Publishing Institute |
OUQ | Optimal Uncertainty Quantification |
PAVA | Pool-Adjacent-Violators Algorithm |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Motivated by the desire to numerically calculate rigorous upper and lower bounds on deviation probabilities over large classes of probability distributions, we present an adaptive algorithm for the reconstruction of increasing real-valued functions. While this problem is similar to the classical statistical problem of isotonic regression, the optimisation setting alters several characteristics of the problem and opens natural algorithmic possibilities. We present our algorithm, establish sufficient conditions for convergence of the reconstruction to the ground truth, and apply the method to synthetic test cases and a real-world example of uncertainty quantification for aerodynamic design.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer