Chapter 5 - Excursion B-Splines - Commented2

Advanced Statistical Learning
Chapter 5: Excursion: B-Splines

Bernd Bischl, Julia Moosbauer, Andreas Groll
Department of Statistics – TU Dortmund

Winter term 2020/21
Splines
Bernd Bischl, Julia Moosbauer, Andreas Groll c Winter term 2020/21 Advanced Statistical Learning – 5 – 1 / 46
SPLINES
In this chapter we will regard a specific class of basis functions, the
so-called B-splines, in detail.
Special thanks to Prof. Brian Marx and Prof. Paul

Eilers for providing most of the slides!!!!
For an in-depth discussion of P-splines, see their new book:

P.H.C. Eilers and B.D. Marx (2020). Practical Smoothing: The Joys of
P-splines. Cambridge University Press.
Basics of Bases
BASIC LINEAR REGRESSION
Scatterplot: pairs (xi , yi ), i = 1, . . . , m

Assumption: straight line fits data well
Equation: µi = α0 + α1 · xi
BASIC LINEAR REGRESSION
Least squares: minimize
n
X n
X
2
S= (yi − α0 − α1 xi ) = (yi − µi )2
i =1 i =1
Matrix notation:
 
1 x1
1 x2 
α
X = . .  α= 0 µ = Xα
 
 .. ..  α1
1 xn
Minimize:
ky − Xα k2 =⇒ XT Xα̂
α = XT y =⇒ α = (XT X)−1 XT y
α̂
CURVED RELATIONSHIPS
Linear fit is not always OK

Judged by eye, or after studying residuals
CURVED RELATIONSHIPS
Linear fit too simple? Add higher powers of x:
µi = α0 + α1 xi + α2 xi2 + α3 xi3 + . . .
More columns in matrix X
x12 x13 . . . x1m

 
1 x1
1 x2 x22 x23 . . . x2m 
X = . .
 
.. .. .. .. 
 .. .. . . . . 
1 xn xn2 xn3 . . . xnm
Same regression equations:

XT Xα̂
α = XT y =⇒ α̂ α = (XT X)−1 XT y
BASIS FUNCTIONS
Regression model µ = Xα
Columns of X: basis functions. Polynomial basis
With sorted x for nice visual representation
Caution: Higher degree polynomials are numerically unstable!

EXAMPLE: MOTORCYCLE DATA
Simulated crash experiment

Acceleration of motorcycle helmets measured
High degree needed for decent curve fit
Bad numerical condition (use orthogonal polynomials)
Slight data changes can change the fit by a huge amount
Longer left part (near zero)
Notice wiggles
TROUBLE WITH POLYNOMIALS
High degree (10 or more) may be needed

Basis functions (powers of x) are global
Moving one end (vertically) moves other end too
Good fit at one end spoils things at other end
Unexpected wiggles
The higher the degree the more sensitive
Global polynomials are a dead end
=⇒ Use local instead of global functions!
B-Splines
B-SPLINE TYPES
Definition of B-Splines: Basis functions are only non-zero over the
intervals between d + 2 adjacent knots, where d is the degree of the
basis (for example d = 3 for a cubic spline). To define an M parameter
B-spline basis, we need to define M + d + 1 knots,
t1 < t2 < . . . < tM +d +1 , where the interval over which the spline is to
be evaluated lies within [td +1 , tM +1 ] (so that the first and last d knot
locations are essentially arbitrary). A d-th degree spline can then be
represented as
M
X
f (x ) = αj Bj (x ; d ) ,
j =1
where αi are unknown spline coefficients.
B-SPLINE TYPES
The i-th B-spline basis of degree d is defined recursively as:
x − tj tj +d +1 − x
Bj (x ; d ) = Bj (x ; d −1)+ Bj −1 (x ; d −1), j = 1, . . . , M
tj +d − tj tj +d +1 − tj +1
where (
1, if x ∈ [tj , tj +1 ),
Bj (x ; 0) =
0, otherwise.
More detailed information about the B-spline basis can be found for
example in Eilers and Marx (1996).
B-SPLINE TYPES
Linear B-Spline:
Two pieces, each a straight line, rest zero
Nicely connected at knots (t1 to t3 ): same values
Slope jumps at knots
B-SPLINE TYPES
Quadratic B-Spline:
Three pieces, each a quadratic segment, rest zero
Nicely connected at knots (t1 to t4 ): same values and slopes
Shape similar to Gaussian
B-SPLINE TYPES
Cubic B-Spline:
Four pieces, each a cubic segment, rest zero
At knots (t1 to t5 ): same values, first & second derivatives
Shape more similar to Gaussian
B-SPLINE TYPES
B-SPLINE TYPES
B-SPLINES IN ACTION
Basis matrix B
Columns are B-splines
 
B1 (x1 ) B2 (x1 ) B3 (x1 ) . . . BM (x1 )
B1 (x2 ) B2 (x2 ) B3 (x2 ) . . . BM (x2 )
 
 .. .. .. .. .. 
 . . . . . 
B1 (xn ) B2 (xn ) B3 (xn ) . . . BM (xn )
In each row only a few non-zero elements (degree plus one)

bij αj = BTi· α
P
Only a few basis functions contribute to µi =
B-SPLINES IN ACTION
B-SPLINES IN ACTION
B-SPLINES IN ACTION
B-splines are local functions, look like Gaussian
B-splines are columns of basis matrix B
Scaling and summing gives fitted values: µ = Bα
The knots determine the B-spline basis
Polynomial pieces make up B-splines, join at knots
General patterns of knots are possible
We consider only equal-spacing
Number of knots determines width and number of B-splines
B-SPLINES IN ACTION
P-Splines
SMOOTHNESS AND ROUGHNESS
More B-splines in basis: more detail is possible

But it is not necessary!
Perfectly smooth curves µ = Bα are possible
It all depends on α
P-spline idea: control smoothness of α
Introduce a penalty on roughness of α
While using a “rich” B-spline basis
The coefficients determine roughness
High roughness: α erratic
Little roughness: smoothly varying α
Simple numerical measure:
M
X
R= (αj − αj −1 )2
j =2
p
Or RMS “change to neighbor”: r = R /(M − 1)
DIFFERENCES AND MATRICES
α = (α2 − α1 , . . . , αM − αM −1 )
We are interested in ∆α
Special matrix makes life easy:
 
−1 1 0 0
α = Dα ;
∆α D =  0 −1 1 0
0 0 −1 1
D has M − 1 rows, M columns
ROUGHNESS WITH A MATRIX
Roughness measure R:
α||2 = ||Dα ||2 = α T DT Dα

R = ||∆α
Matrix D easily computed

(In R: D <- diff(diag(M)))
So we have easy tool to express roughness
PENALIZING LEAST SQUARES
We set a double goal:

good fit to the data: low S = ||y − Bα ||2
smooth curve, i.e. low roughness: R = ||Dα ||2
Balance of the two in one function (λ > 0):
Q = S + λR = ||y − Bα||2 + λ||Dα||2
Last term known as penalty

λ is smoothing (or penalty or regularization) parameter
User sets λ (for now, automatic choice later)
Penalized least squares
Minimize
Q = S + λR = ||y − Bα ||2 + λ||Dα ||2

= yT y − 2α T BT y + α T (BT B + λDT D)α
α
Set derivative w.r.t. α zero; result:
(BT B + λDT D)α α = (BT B + λDT D)−1 BT y

α = BT y =⇒ α̂
Small modification of BT Bα = BT y
HIGHER ORDER DIFFERENCES
α = (α2 − α1 , . . . , αM − αM −1 )
First order: ∆α
α) = ∆2α
Second order: ∆(∆α
∆2α = [(αj − αj −1 ) − (αj −1 − αj −2 )]j =3,...,M

= [(αj − 2αj −1 + αj −2 )]j =3,...,M
Matrix in R: D <- diff(diag(M), diff = 2)

 
1 −2 1 0 0
D2 = 0 1 −2 1 0
0 0 1 −2 1
Third order
∆3α = [(αj − 3αj −1 + 3αj −2 − αj −3 )]j =4,...,M
Matrix
 
−1 3 −3 1 0 0
D3 =  0 −1 3 −3 1 0
0 0 −1 3 −3 1
In R: D <- diff(diag(M), diff = 3)
INTERPOLATION WITHOUT A PENALTY
Interpolation with B-splines

α
fit B-splines by regression: ŷ = Bα̂
compute B-splines at new x̃ : Bj (x̃ ), j = 1 . . . M
P
α
µ(x̃ ) = j Bj (x̃ )α̂
Works fine if you can estimate α
Regression may fail with large gaps in x
Then some B-splines have no support
Singular system of equations
INTERPOLATION WITH A PENALTY
Penalty lets elements of α hold hands

They bridge the gap(s) automatically!
EXTRAPOLATION
Extrapolation works the same

Just choose the domain of x wide enough
Take a generous number of (cubic) B-splines
Penalty again bridges the gap
Smooth fit and neat extrapolation automatically
With differences of order d in penalty
INTER- AND EXTRAPOLATION
with d = 1
with d = 2
with d = 3
STRONG SMOOTHING
Example (λ = 106 ):
WRAP-UP
Basis with many B-splines allows detail

But smoothness depends on coefficients α in Bα
Difference penalty on α to tune smoothness
P-splines allow flexible interpolation and extrapolation

Chapter 5 - Excursion B-Splines - Commented2

Uploaded by

Copyright:

Available Formats

Chapter 5 - Excursion B-Splines - Commented2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 - Excursion B-Splines - Commented2

Uploaded by

Copyright:

Available Formats

Advanced Statistical Learning

Chapter 5: Excursion: B-Splines

Department of Statistics – TU Dortmund

Special thanks to Prof. Brian Marx and Prof. Paul

For an in-depth discussion of P-splines, see their new book:

Scatterplot: pairs (xi , yi ), i = 1, . . . , m

Linear fit is not always OK

More columns in matrix X

x12 x13 . . . x1m

Same regression equations:

Caution: Higher degree polynomials are numerically unstable!

Simulated crash experiment

High degree (10 or more) may be needed

=⇒ Use local instead of global functions!

where αi are unknown spline coefficients.

In each row only a few non-zero elements (degree plus one)

More B-splines in basis: more detail is possible

D has M − 1 rows, M columns

α||2 = ||Dα ||2 = α T DT Dα

Matrix D easily computed

We set a double goal:

Q = S + λR = ||y − Bα||2 + λ||Dα||2

Last term known as penalty

Q = S + λR = ||y − Bα ||2 + λ||Dα ||2

Set derivative w.r.t. α zero; result:

(BT B + λDT D)α α = (BT B + λDT D)−1 BT y

∆2α = [(αj − αj −1 ) − (αj −1 − αj −2 )]j =3,...,M

Matrix in R: D <- diff(diag(M), diff = 2)

∆3α = [(αj − 3αj −1 + 3αj −2 − αj −3 )]j =4,...,M

In R: D <- diff(diag(M), diff = 3)

Interpolation with B-splines

Penalty lets elements of α hold hands

Extrapolation works the same

Basis with many B-splines allows detail

You might also like