PWLF Jekel Venter v2
PWLF Jekel Venter v2
PWLF Jekel Venter v2
Name: pwlf
Version: 0.4.1
Description: fit piecewise linear functions to data
License: MIT
Creator: Charles F. Jekel
Maintainer: Charles F. Jekel - [email protected]
Homepage: https://github.com/cjekel/piecewise_linear_fit_py
Contributors: https://github.com/cjekel/piecewise_linear_fit_
py/graphs/contributors
Abstract
A Python library to fit continuous piecewise linear functions to one
dimensional data is presented. A continuous piecewise linear function has
breakpoints which represent the termination points of the line segments.
If breakpoint locations are known, then a least square fit is used to solve
for the best continuous piecewise linear function. If the breakpoints are
unknown but the desired number of line segments is known, then global
optimization is used to find the best breakpoint locations. This optimiza-
tion process solves a least squares fit several times for different breakpoint
locations. The paper describes the mathematical methods used in the li-
brary, provides a brief overview of the library, and presents a few simple
examples to illustrate typical use cases.
1 Introduction
Piecewise linear functions are simple functions which consist of several discrete
linear segments that are used to describe a one-dimensional (1D) dependent
variable. The locations where one line segment ends and a new line begins are
referred to as breakpoints. Enforcing continuity between these line segments
has resulted in a number of interesting models used in a variety of disciplines
[1][2][3][4][5][6][7]. Muggeo [8] provides a review of the various techniques that
have been used to fit such piecewise continuous models.
∗ Dept
of Mechanical & Aerospace Engineering, University of Florida, Gainesville, FL 32611
† Departmentof Mechanical and Mechatronic Engineering, Stellenbosch University, Stellen-
bosch, South Africa
1
This paper introduces a Python library for fitting continuous piecewise linear
functions to 1D data. The library is called pwlf and was first available online
on April 1, 2017. The library includes a number of functions related to fitting
continuous piecewise linear models. For instance, pwlf can be used to fit a
continuous piecewise linear function for a specified number of line segments.
This type of fit is performed using global optimization to minimize the sum-of-
squares error. Two different global optimization strategies are included in the
library’s fit and fitfast functions. A user may additional specify their own global
optimization routine. The library also provides a number of other statistical
properties associated with these continuous piecewise linear functions.
The paper describes the methodology that pwlf uses to fit continuous piece-
wise linear functions. Essentially a least squares fit can be performed if the
breakpoint locations are known [9]. If the breakpoint locations are unknown, a
global optimization routine is used to find the optimal breakpoint locations by
minimizing the sum-of-squares error. First, the mathematical methods and var-
ious statistical properties available in pwlf are presented. Next, a basic overview
of the pwlf library is provided, followed by a collection of simple examples for
typical use cases. The main highlights of the library are the following:
• simple Python interface for fitting continuous piecewise linear functions
2 Mathematical formulation
This section describes the mathematical methods used in pwlf. The least squares
problem is defined if breakpoints are known. It is additionally possible to force
the linear function through a set of data points by using a constrained least
squares formulation. Various statistics associated with the linear regression
problem are presented. These statistics include: the coefficient of determination,
standard errors, p-values for model parameters, and the prediction variance
of the model. Lastly, an optimization problem is presented to find the best
breakpoint locations. The optimization problem solves the least squares problem
several times for various breakpoint combinations by minimizing the sum-of-
square of the residuals. There is no guarantee that the absolute best (or global
minimum) is found, but this is true for many non-linear regression routines.
2
2.1 Least squares with known breakpoints
This work assumes some 1D data set, where x is the independent variable.
Additionally, y is dependent on x such that y(x). The data can be paired as
x1 y1
x2 y2
(1)
x3 y3
.. ..
. .
xn yn
where (x1 , y1 ) represents the first data point. The data points should be ordered
according to x1 ≤ x2 ≤ x3 ≤ · · · ≤ xn for n number of data points. A piecewise
linear function can be described as the following set of functions
η1 + m1 (x − b1 ) b1 < x ≤ b2
η2 + m2 (x − b2 )
b2 < x ≤ b3
y(x) = . .. (2)
.
. .
ηnb −1 + mnb −1 (x − bnb −1 ) bnb −1 < x ≤ bnb
numpy.argsort.
3
and (
0 x n ≤ b3
1xn >b3 = (6)
1 xn > b 3
and so forth. This is a simple linear system of equations where
Aβ = y (7)
β = AT A AT y.
−1
(8)
e = Aβ − y (9)
where e is a n×1 vector. The residual vector is the difference between the fitted
continuous piecewise linear model and the original data set. The sum-of-squares
of the residuals then becomes
SSR = eT e (10)
will only be lower triangular for particular data and breakpoint combinations.
4
with nc number of constrained data points. Let’s call xc the vector of con-
strained x locations, and yc the vector of constrained y locations. A new cn ×nb
matrix C is assembled where
1 xc1 − b1 (xc1 − b2 )1xc1 >b2 ··· (xc1 − bnb −1 )1xc1 >bnb −1
1 xc2 − b1 (xc2 − b2 )1xc2 >b2 ··· (xc2 − bnb −1 )1xc2 >bnb −1
C = . . . . .. .
.. .. .. .. .
1 xcnc − b1 (xcnc − b2 )1xcnc >b2 · · · (xcnc − bnb −1 )1xcnc >bnb −1
(12)
Note this is the same procedure used to assemble A, but xc is used instead of
x.
The Karush–Kuhn–Tucker (KKT) equations for the constrained least
squares problem become
2.0ATA C T β
T
2A y
= (13)
C 0 ζ yc
where ζ is the vector of Lagrangian multipliers which will be solved along with
the β model parameters [10]. This is a square matrix of shape (nb + nc ) × (nb +
nc ), and 2AT y is a vector with shape nb × 1. Note that once β has been solved,
the calculation of the residual vector still follows Eqn. 9.
where ȳ is the mean of y. The total sum-of-squares depends only on the observed
data points, and is independent of the fitted model. Then the coefficient of
determination is obtained from
SSR
R2 = 1 − (15)
SST
where SSR is the sum-of-square of the residuals as defined in Eqn. 10.
5
2.3.2 Standard error for each model parameter
The standard error can be calculated for each model parameter. The standard
error represents the estimate of the standard deviation of each β parameter
due to noise in the data. This derivation follows the standard error calculation
presented in Coppe et al. [11] for linear regression problems.
First the unbiased estimate of the variance is calculated as
SSR
σ̂ 2 = (16)
n − nb
where n is the number of data points, and nb is the number of model parameters
(or breakpoints used). Then the standard error (SE) for the βj model parameter
is q
SE(βj ) = σ̂ 2 AT A jj
−1
(17)
for each j parameter ranging from j = 1, · · · , nb . It is often assumed that
the parameters follows a normal distribution, with mean of βj and standard
deviation of SE(βj ).
6
x. The reason for this is that the prediction variance can be calculated at any
x within the model, and is not restricted to the original x data. The prediction
variance as a function of x̂ is given by
PV(x̂) = σ̂ 2 diag Â[AT A ÂT
−1
(21)
where diag represents the diagonal along the matrix. It’s generally assumed that
ŷ follows a normal distribution, with standard deviation equal to PV(x̂).
p
Two different optimization strategies are currently utilized. The first strat-
egy utilizes Differential Evolution (DE) for the global optimization [13]. The
DE optimization algorithm is used in the fit function as pwlf ’s default optimiza-
tion strategy. The specific DE strategy being used has been implemented into
SciPy [14]. The DE optimization strategy is a very popular heuristic optimizer
that has been used in a variety of applications. However, cases may arise where
the progress of DE is deemed too slow, too expensive, or the DE results are
consistently undesirable.
As an alternative to DE, a multi-start gradient based optimization strat-
egy is used in the fitfast function. The multi-start optimization algorithm first
generates an initial population from a latin-hypercube sampling3 . This is a
space filling experiment design, where each point in the population represents a
unique combination of breakpoint locations. A local optimization is then per-
formed from each point in the population. Running multiple local optimizations
is a strategy that attempts to find the global optimum, and such a strategy was
mentioned by Muggeo [15] for solving problems with multiple local minima. The
local optimization algorithm being used is the LBFGS [16] gradient based opti-
mizer implemented in SciPy [14]. Schutte et al. [17] observed a case where the
3 The latin-hypercube sampling is done using the pyDOE package. https://pythonhosted.
org/pyDOE/
7
multi-start optimization performance may exceed running a single optimization
algorithm for an extended period of time. The caveat with the multi-start gra-
dient based optimization algorithms is that each individual optimization may
get stuck at a local minima, and thus increasing the number of starting points
will increase the chances of finding the global optimum.
The overall methodology described is an optimization within an optimiza-
tion, or a double-loop optimization. There is an inner optimization occurring at
every function evaluation. This inner optimization is the least squares fit which
finds the best continuous piecewise linear model for a given set of breakpoint
locations. It is required to solve the least squares problem several times within
the outer optimization process. As shown later in the examples, the outer opti-
mization process can be used to find breakpoint locations in a short amount of
time on a modern computer. This is largely possible because of the efficiency
of the least squares method. Finding the breakpoint locations for other error
measures (i.e. minimizing absolute average deviation, or any other LN norm)
would be a significantly more expensive problem, because an iterative solver
would be required within the inner loop of the optimization process.
3.1 Installation
It is recommended to install pwlf using pip by running
in a shell (or command prompt)4 . This will download and install the latest
pwlf release along with the necessary dependencies. The dependencies are the
following:
• Python >= 2.7
• NumPy >= 1.14.0
4 The PyPA recommended tool for installing packages is with pip https://pypi.org/
project/pip/
8
3.2 Versioning
To import and check the version of pwlf run
import pwlf
pwlf.__version__
where model becomes the working object in which all fitting routines are run
for the particular x and y data. If the breakpoint locations are known, use
model.fit_with_breaks to perform a least squares fit. If breakpoint locations are
unknown, use model.fit or model.fitfast to perform a fit by specifying the desired
number of line segments. Once a fit has been performed, the object will contain
the following attributes:
5 https://semver.org/spec/v2.0.0.html
9
4 Examples
Simple examples are provided for the following use-cases:
1. fit with explicit breakpoint locations
2. fit for specified number of line segments
3. force the fit through data points
4. use a custom optimization routine to find the optimal breakpoint locations
For additional examples, please look in the examples folder within the source
which is available at https://github.com/cjekel/piecewise_linear_fit_
py.
To get started with the examples: first import the necessary libraries, copy
the x and y data, and finally initialize the fitting object as model.
import numpy as np
import pwlf
x = np.array([1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.,
12., 13., 14., 15.])
y = np.array([5., 7., 9., 11., 13., 15., 28.92, 42.81, 56.7,
70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
model = pwlf.PiecewiseLinFit(x, y)
Prediction from a fitted model just requires calling the predict method on
new x locations. The following code evaluates the model on 100 new x̂ locations
within the domain of x.
The resulting fit and data can be (optionally) plotted using the matplotlib
library. The resulting fit is shown in Fig. 1.
10
import matplotlib.pyplot as plt
plt.figure()
plt.plot(x, y, 'o')
plt.plot(x_hat, y_hat, '-')
plt.grid()
plt.xlabel('x')
plt.ylabel('y')
plt.show()
140
120
100
80
y
60
40
20
0
2 4 6 8 10 12 14
x
breakpoints = model.fit(2)
11
140
120
100
80
y
60
40
20
0
2 4 6 8 10 12 14
x
12
140
120
100
80
y
60
40
20
0
0 2 4 6 8 10 12 14
x
Figure 3: Example of finding the best two continuous piecewise lines that go
through the point (0.0, 0.0).
5 Conclusion
A methodology was presented for fitting continuous piecewise linear functions to
1D data. If breakpoints (or the termination of each line segment locations) are
known, then a simple least squares fit is performed to find the best continuous
piecewise linear function. If breakpoints are unknown but the desired number
of line segments is known, then optimization is used to find the breakpoint
locations of the best continuous piecewise linear function. This is a double
optimization process. The outer loop is attempting to find the best breakpoint
locations, while the inner loop is performing a least squares fit to find the best
β model parameters. This methodology of fitting continuous piecewise linear
functions, as well as various statistics associated with this particular regression
13
model have been discussed in detail. A few examples of the basic usage of pwlf
was described in this paper. Additionally, there are a number of other examples
available online at https://github.com/cjekel/piecewise_linear_fit_py
Acknowledgments
Charles F. Jekel would like to acknowledge University of Florida’s Graduate
Preeminence Award and U.S. Department of Veterans Affairs’ Educational As-
sistance program for providing funding for his PhD.
Thanks to Raphael Haftka for his numerous comments related to optimiza-
tion and linear regression.
References
[1] N. M. Fyllas, S. Patiño, T. R. Baker, G. Bielefeld Nardoto, L. A. Martinelli,
C. A. Quesada, R. Paiva, M. Schwarz, V. Horna, L. M. Mercado, A. Santos,
L. Arroyo, E. M. Jiménez, F. J. Luizão, D. A. Neill, N. Silva, A. Prieto, A. Rudas,
M. Silviera, I. C. G. Vieira, G. Lopez-Gonzalez, Y. Malhi, O. L. Phillips,
and J. Lloyd, “Basin-wide variations in foliar properties of Amazonian forest:
phylogeny, soils and climate,” Biogeosciences, vol. 6, no. 11, pp. 2677–2708,
2009. [Online]. Available: https://www.biogeosciences.net/6/2677/2009/ 1
[2] C. Ocampo-Martinez and V. Puig, “Piece-wise linear functions-based model
predictive control of large-scale sewage systems,” pp. 1581–1593, 2010.
[Online]. Available: http://digital-library.theiet.org/content/journals/10.1049/
iet-cta.2009.0206 1
[3] R. Heinkelmann, J. Böhm, S. Bolotin, G. Engelhardt, R. Haas, R. Lanotte,
D. S. MacMillan, M. Negusini, E. Skurikhina, O. Titov, and H. Schuh,
“VLBI-derived troposphere parameters during CONT08,” Journal of Geodesy,
vol. 85, no. 7, pp. 377–393, jul 2011. [Online]. Available: https:
//doi.org/10.1007/s00190-011-0459-x 1
[4] S. Klikovits, A. Coet, and D. Buchs, “ML4CREST: Machine Learning for CPS
Models.” 1
[5] G. Villarini, J. A. Smith, and G. A. Vecchi, “Changing Frequency of Heavy
Rainfall over the Central United States,” Journal of Climate, vol. 26, no. 1, pp.
351–357, 2013. [Online]. Available: https://doi.org/10.1175/JCLI-D-12-00043.1
1
[6] J. Ollerton, H. Erenler, M. Edwards, and R. Crockett, “Extinctions of
aculeate pollinators in Britain and the role of large-scale agricultural changes,”
Science, vol. 346, no. 6215, pp. 1360–1362, 2014. [Online]. Available:
http://science.sciencemag.org/content/346/6215/1360 1
[7] E. Jauk, M. Benedek, B. Dunst, and A. C. Neubauer, “The relationship
between intelligence and creativity: New support for the threshold hypothesis
by means of empirical breakpoint detection,” Intelligence, vol. 41, no. 4,
pp. 212–221, 2013. [Online]. Available: http://www.sciencedirect.com/science/
article/pii/S016028961300024X 1
[8] V. M. R. Muggeo, “Estimating regression models with unknown break-points,”
Statistics in Medicine, vol. 22, no. 19, pp. 3055–3071, 2003. [Online]. Available:
https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.1545 1
14
[9] N. Golovchenko, “Least-squares fit of a continuous piecewise linear function,”
2004. 2
[14] E. Jones, T. Oliphant, P. Peterson, and Others, “SciPy: Open source scientific
tools for Python.” [Online]. Available: http://www.scipy.org/ 7
[16] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, “A limited memory algorithm for
bound constrained optimization,” SIAM Journal on Scientific Computing, vol. 16,
no. 5, pp. 1190–1208, 1995. 7
15