Surrogate Models: Hans Bruun Nielsen IMM, Numerical Analysis Section

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

24.11.2004 Surrogate Models 1 24.11.

2004 Surrogate Models 3

Residuals y − P (x )
i 16 i
Surrogate Models Poor approximation. 1

Polynomials have too “long memory”. 0.5


Hans Bruun Nielsen
The Taylor expansion 0
IMM, Numerical Analysis Section X
n
1 k (k) −0.5
Pn(x+h) = Pn(x) + h Pn (x)
k!
k=1 −1
• Alternatives to physically based mathematical models and local Taylor expansions is exact for any h
1.5 2 2.5 3 3.5 4 4.5

< x < 3 : Increase n


Trends for 2.2 ∼
• Metamodels, Surface Response, Neural Networks, . . . ∼
< 2 and x > 3.5
Degree too high for x ∼ ∼
• Space Mapping

• Radial Basis Functions (RBF)

• Kriging, “Design and Analysis of Computer Experiments” (DACE)


Cubic Splines
Approximation tools. Interested in applications in Information that should be carried by 3rd and higher derivatives is lost.
Local nature. Put knots where they are needed.
• Data representation (fitting)
M.J.D. Powell: Curve fitting by splines in one variable
• Optimization
pp 65 –83 in J.G. Hayes (ed): Numerical approximation to functions and data, 1970.

24.11.2004 Surrogate Models 2 24.11.2004 Surrogate Models 4

Data Fitting 123 data points with noise Knots κ0, κ1, . . . , κn Fit with natural cubic spline, n = 11
12 12

Given {(xi , yi)}m


i=1 , yi = Y (xi) + ei
10
X
n+3 10

8 s(x) = cj Bj (x) 8

6 6
Seek (an approximation to) Y (x) j=1
4 4

2 Basis spline Bj is nonzero only in four 2


May have a mathematical model
0 0
1.5 2 2.5 3 3.5 4 4.5 consecutive knot intervals. Local support. 1.5 2 2.5 3 3.5 4 4.5
Y (x) ' M (p, x)
cj has influence only in [κj−4, κj ] Residuals yi − s(xi)
Parameters p eg determined by minimizing
X 12
Fit with degree 16 polynomial 1

%(p) = wi2 (yi − M (p, xi))2 10


Basis spline B8
0.5

8 0.6
0
6

4 0.4 −0.5

In lack of a proper model we may use 2


−1
0.2 1.5 2 2.5 3 3.5 4 4.5
0
a polynomial as “surrogate model”. 1.5 2 2.5 3 3.5 4 4.5

0
2.3 2.7 3.1 3.5
24.11.2004 Surrogate Models 5 24.11.2004 Surrogate Models 7

Number of basis functions for Pn 1.5


x ∈ IRd. Polynomials and splines generalize. Level curves of a function that we want to approx-
d 1 2 3 5 10 50 1
Curse of dimensionality imate show eg, that we need close knots in both
n=1 2 3 4 6 11 51
directions at (0.6, 0.5). 0.5
Interpolation or fitting, 2 3 6 10 21 66 1326
3 4 10 20 56 286 23426
Bc'f 0
4 5 15 35 126 1001 316251
Serious risk of rank deficient B. −0.5
5 6 21 56 252 3003 3478761 −1 −0.5 0 0.5

Also close where it is not needed. The system, 1.5


Basis spline B
Bicubic splines (x = (u, v) ∈ IR ). 2 6,7
Ac'f 1
nX X
1 +3 n 2 +3
is either rank deficient or needs many “superfluous” 0.5
s(x) = cij Bi(ξ, u)Bj (η, v)
i=1 j=1
data points.
0
ξ and η : knots in u and v-directions, resp.
−0.5
−1 −0.5 0 0.5

24.11.2004 Surrogate Models 6 24.11.2004 Surrogate Models 8

Example. Apnea. Measurements of pressure in throat as function of Gaussian. θ = 10

distance (22 values in ]0, 10[ cm) and time (every 0.1 second). Alternative approximating function.
Given data points (xi, yi), i = 1, 2, . . . , m
Data Fit
with distinct xi ∈ IRd and yi ∈ IR
Surrogate model
X
m X
n
1 1 s(x) = cT φ(x) + β T ψ(x) = cj φj (x) + βj ψj (x)
j=1 j=1
0 0 where the ψj are basis functions eg for a low order poly-
−1 −1
nomial that models a “global trend”, and Gaussian. θ = 50
5 5
4300
4200
4300
4200
φj (x) = φ(kx − xj k2)
4100 0 4100 0
Eg a Gaussian
φ(r) = e−θ r
2

Missing data and “wild points”.


n1 = 8, sliding window with one knot per 20 profiles (2 seconds). The figures show x ∈ [−1, 1]2

Byproduct: Data compression.


24.11.2004 Surrogate Models 9 24.11.2004 Surrogate Models 11

Surrogate models based on Kriging and Radial Basis Functions (RBF) both have the form The solution can be reformulated to
T T
s(x) = c φ(x) + β ψ(x) . s(x) = cT φ(x) + β T ψ(x) .
Different derivation, but (under certain conditions on φ): same model. β is the “generalized least squares” solution to Ψβ ' y and c is found from the corresponding residual,
We consider interpolation, ie s(xi) = yi, i=1, . . . , m. ¡ T −1 ¢
Ψ Φ Ψ β = ΨT Φ−1y, c = Φ−1(y − Ψβ). Same solution!
Let Φ ∈ IRm×m, Ψ ∈ IRm×n be the matrices defined by 2 1
The process variance is estimated by σ = (y − Ψβ)T Φ−1(y − Ψβ) ,
m
Φij = φ(kxi − xj k2), Ψij = ψj (xi) and the MSE can be computed by
³ ¡ ¢−1 ´
The interpolation condition can be expressed as Ω(x) = σ 2 1 + uT ΨT Φ−1Ψ u − φT Φ−1φ , u = ΨT Φ−1φ − ψ .
Φ c + Ψ β = y.
In classical Kriging the correlation function φ is estimated from a variogram of the data, or it may be
2
In the case of RBF this is combined with the condition that c should be orthogonal to the range of Ψ, taken from a list. We may, eg, use a Gaussian: φj (x) = e−θ kx−xj k .
and we get the linear system of equations The results depend on θ, and assuming a Gaussian process the optimal choice is
à !à ! à ! à !à ! à !
Φ Ψ c y Φ Ψ c y θopt = argmin{ |Φ(θ)|1/m · σ 2(θ) } ( | · | = det(·) ) .
= ⇔ = .
ΨT 0 β 0 0 −ΨT Φ−1Ψ β −ΨT Φ−1y
¡ ¢−1 Matlab toolbox DACE (with Søren N. Lophaven and Jacob Søndergaard)
Solution: β = ΨT Φ−1Ψ P siT Φ−1y, c = Φ−1 (y − Ψβ) .
Available at http://www.imm.dtu.dk/∼hbn/dace/

24.11.2004 Surrogate Models 10 24.11.2004 Surrogate Models 12

Kriging is a statistical approach. Example. Rosenbrock’s function. n = 1, ψ(x) = 1



T Start with 9 points. Best θ ∈ [0.1, 100] . RM S = Ω
s(x) = z(x) + β ψ(x) ,
£ ¤
where z is stochastic with mean 0 and covariances E z(x), z(xj ) = σ 2φj (x). Process variance σ 2 m = 9. θ = 1.046
£ ¤ 200 RMS. m = 9 |s − f|. m = 9
Apply this model to the given data to get y = Z + Ψβ, E Z Z T = σ2 Φ . 1.5 1.5
100 30

1 25 1
Express s(x) as a linear predictor, s(x) = y T γ(x) . The error is 0
¡ ¢ ¡ ¢ 1.5 20
s(x) − y(x) = (Z + Ψβ)T γ(x) − z(x) + β T ψ(x) = Z T γ(x) − z(x) + β T ΨT γ(x) − ψ(x) 0.5 0.5
1 15

T 0.5 10
Unbiased predictor. Constraint: Ψ γ(x) − ψ(x) = 0 . 0 0
0 5
Mean squared error (MSE) 0 0.5 −0.5 −0.5
−0.5 −1 −0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5
£ ¤ £ ¤ ¡ ¢
Ω(x) = E (s(x) − y(x)) = E z 2 + γ T Z Z T γ − 2γ T Z x = σ 2 1 + γ T Φ γ − 2γ T ψ
p
Minimize Ω with respect to γ and subject to the constraint: Successively insert new data points where s(x) − .05 Ω(x) is minimal.
à !à ! à !
Φ Ψ γ φ
=
ΨT 0 λ ψ
24.11.2004 Surrogate Models 13 24.11.2004 Surrogate Models 15

m = 10. θ = 1.174 RBF. Gaussian. Choose θ


200 RMS. m = 10 |s − f|. m = 10 RBF. m = 9. θ = 1.046 RBF. m = 9. θ = 0.200 RBF. m = 9. θ = 5.000
1.5 1.5
100 30 200 200 200

1 25 1 100 100 100


0
1.5 20
0 0 0
0.5 0.5 1.5 1.5 1.5
1 15

0.5 10 1 1 1
0 0
0 5 0.5 0.5 0.5
0 0.5 −0.5 −0.5
−0.5 −1 −0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5
0 0 0
0 0.5 0 0.5 0 0.5
−0.5 −1 −0.5 −0.5 −1 −0.5 −0.5 −1 −0.5

m = 11. θ = 1.660
|s − f|. θ = 1.046 |s − f|. θ = 0.200 |s − f|. θ = 5.000
200 RMS. m = 11 |s − f|. m = 11 1.5 1.5 1.5
1.5 1.5 30 30 30
100 30
1 25 1 25 1 25
1 25 1
0
20 20 20
1.5 20
0.5 0.5 0.5
15 15 15
0.5 0.5
1 15
10 10 10
0.5 10 0 0 0
0 0 5 5 5
0 5
−0.5 −0.5 −0.5
0 0.5 −0.5 −0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5
−0.5 −1 −0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5

24.11.2004 Surrogate Models 14 24.11.2004 Surrogate Models 16

¡ ¢−1/2
m = 12. θ = 2.636 RBF. Inverse multiquadric: φj (x) = θkx − xj k2 + 1
200 RMS. m = 12 |s − f|. m = 12
1.5 1.5 m = 9. θ = 1.0e−004 m = 9. θ = 1.0e−003 m = 9. θ = 1.0e−002
100 30
200 200 200
1 25 1
0
100 100 100
1.5 20
0.5 0.5 0 0 0
1 15
1.5 1.5 1.5
0.5 10
0 0 1 1 1
0 5
0.5 0.5 0.5
0 0.5 −0.5 −0.5
−0.5 −1 −0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5
0 0 0
0 0.5 0 0.5 0 0.5
−0.5 −1 −0.5 −0.5 −1 −0.5 −0.5 −1 −0.5
m = 13. θ = 2.636
200 RMS. m = 13 |s − f|. m = 13 |s − f|. m = 9 |s − f|. m = 9 |s − f|. m = 9
1.5 1.5 1.5 1.5 1.5
100 30

1 25 1 1 1 1
0
1.5 20
0.5 0.5 0.5 0.5 0.5
1 15

0.5 10
0 0 0 0 0
0 5

0 0.5 −0.5 −0.5 −0.5 −0.5 −0.5


−0.5 −1 −0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5 −1 −0.5 0 0.5
24.11.2004 Surrogate Models 17

Plans for future work

• Better error estimation for Kriging

• Extend DACE to cope with errors in data

• Strategy for use in optimization

• Choice of θ in RBF

• Extend DACE to cope with other RBFs

• ···

With Kristine Frisenfeldt Thuesen

You might also like