Serfidan 2020

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Computers and Chemical Engineering 134 (2020) 106711

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/compchemeng

Optimal estimation of physical properties of the products of an


atmospheric distillation column using support vector regression
Ahmet Can Serfidan a,b, Firat Uzman a,b, Metin Türkay b,∗
a
TUPRAS, Petrol cad. No: 25, Korfez, Kocaeli 41780, Turkey
b
Rumelifeneri Yolu, Koc University, Sariyer, Istanbul 34450, Turkey

a r t i c l e i n f o a b s t r a c t

Article history: Atmospheric distillation column is one of the most important units in an oil refinery where crude oil
Received 15 June 2019 is fractioned into its more valuable constituents. Almost all of the state-of-the art online equipment has
Revised 30 November 2019
a time lag to complete the physical property analysis in real time due to complexity of the analyses.
Accepted 27 December 2019
Therefore, estimation of the physical properties from online plant data with a soft sensor has significant
Available online 2 January 2020
benefits. In this paper, we estimate the physical properties of the hydrocarbon products of an atmo-
Keywords: spheric distillation column by support vector regression using Linear, Polynomial and Gaussian Radial
Data analytics Basis Function kernels and SVR parameters are optimized by using a variety of algorithms including ge-
Optimization netic algorithm, grid search and non-linear programming. The optimization-based data analytics approach
Parameter estimation is shown to produce superior results compared to linear regression, the mean testing error of estimation
Support vector regression is improved by 5% with SVR 4.01 °C to 3.8 °C.
Atmospheric distillation
© 2020 Elsevier Ltd. All rights reserved.

1. Introduction The lack of online measurement of hydrocarbon properties can


be complemented by online estimation methods. An estimation
Crude Distillation Unit (CDU) is the heart of a petroleum refin- function of temperature and pressure as input variables can be
ery where crude oil is separated into its naturally occurring frac- fitted to laboratory analysis data of the physical properties. The
tions. Atmospheric Distillation Column (ADC) is the most com- ADC in İzmit Refinery of Turkish Petroleum Refineries Corporation
plex unit in the CDU which operates at atmospheric pressure which is the subject of this study has employed linear regression
(Gary et al., 2010; Leffler, 1985). The operation parameters of an for the online estimation of the critical properties of hydrocarbon
ADC affect the amounts and physical properties of its products. products like the 95% boiling point temperature of heavy diesel.
Dynamically changing demands and prices for end products forces This prediction embodies a fixed function of heavy diesel tray tem-
the planners to frequently update the optimal operating parame- perature and flash zone temperature and pressure which is then
ters and maximize the amount of a particular product while keep- fitted to laboratory data by a gain and a bias. There have been
ing the products within their specified limits. This gives rise to the studies on rigorous modeling of ADCs. This method is highly com-
need for online monitoring of the properties. plex and embodies a comprehensive physical properties library,
The online monitoring of temperature, pressure and flowrate refers to the laws of thermodynamics and is hard to fit to a real
within the ADC is possible with the measurement devices con- operating unit. Though many of the hydrocarbon properties can be
nected to Distributed Control Systems (DCS). However online mon- gathered from the simulation results of a rigorous model, the rigor-
itoring of hydrocarbon properties is only possible by installing on- ous model still needs periodic maintenance as it sways away from
line analyzers which are very complex, hard-to-maintain and ex- real system in time.
pensive. Therefore, chemical composition and physical properties Recent studies have successfully employed Machine Learning in
are monitored by taking samples periodically from the ADC and modeling chemical processes in petroleum refining. Artificial Neu-
analyzing the samples in a laboratory with appropriate equipment ral Network (ANN) has been a popular choice in this area but has
which takes up considerable amount of time. suffered from over-fitting. ANN like any regression method that
embodies Empirical Risk Minimization, minimizes the estimation
error regardless of the model complexity. This in turn leads to poor
performance in generalization of unseen and unvalidated testing

Corresponding author. data.
E-mail address: [email protected] (M. Türkay).

https://doi.org/10.1016/j.compchemeng.2019.106711
0098-1354/© 2020 Elsevier Ltd. All rights reserved.
2 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711

More recent studies have utilized Support Vector Machines tion, ideal gas enthalpy can be predicted by empirical correlations
(SVM) in regression problems. SVMs, in contrast to ANN embod- which are documented in the Petroleum Data Book of the Ameri-
ies Structural Risk Minimization that aims to generate a flatter and can Petroleum Institute (API, 2008).
less complex function. By implementing a new loss function, Sup- A commonly used method in many chemical process simulators
port Vector Regression (SVR) chooses the flattest path where the is the inside-out algorithm by Russell (1983), which is explained in
error is kept within a predefined width for insensitive region and a Khoury (2014). This method simplifies the phase equilibrium co-
predefined cost factor handles outlier data. Moreover, by employ- efficient and enthalpy calculations which are computationally ex-
ing kernel functions, SVR can be trained to generate a nonlinear pensive, by linearizing the function in terms of the stage tempera-
estimation function. Lastly, the user-defined parameters of SVR can ture. The inner loop solves the model according to the performance
be optimized via cross validation embedded in global optimizers specifications given by the user with the simplified functions. Then
like Genetic Algorithm or simpler solvers like Grid Search, to max- the outer loop updates the phase equilibrium coefficients, and en-
imize generalization performance. thalpy calculations, and checks if the update is within the specified
limits. The algorithm terminates if the update is within the defined
2. Literature survey limits. If it is not, the functions are linearized again for another set
of inner loop iterations. This method requires little information for
Since Crude Distillation Units are complex and energy intensive generating initial estimates and has lower computational cost.
units, safe and optimum operation of the unit has a great signifi- Statistical models are widely used for prediction of the rela-
cance in oil refining. This leads to a need in proper monitoring of tionship between a dependent variable and one or more indepen-
the operation for process engineers and a proper estimation of the dent variables. There are many methods for generating a statisti-
operation dynamics for planning engineers. Both objectives prove cal model and the most common among them is the least squares
to be troublesome as the physical properties can be measured by regression method which may give misrepresentative results with
taking a sample from the stream periodically and then analyz- outliers and noise in the training data and may not generalize well
ing these sample in a laboratory with appropriate equipment or to unseen and unchecked testing data.
by online analyzers that are very expensive, complex and hard-to- The concept of Support Vector Machines (SVM) has been es-
maintain sets of equipment. The periodicity in measurement leads tablished by Vapnik and Lerner (1963) and Vapnik and Chervo-
to sub-optimal control of the unit or in some cases to off-spec nenkis (1968) in 1963 for solving classification problems. Then
products. This gives rise to the need for estimating the plant dy- nonlinear classification has been made possible by the introduc-
namics and physical properties of the product streams. Different tion of kernel functions, proposed by Boser et al. (1992) which has
methods like rigorous distillation column models that depend on been suggested by Aizerman et al. (1964). The problem of outliers
the laws of thermodynamics, and statistical models that depend on have been finally solved by soft margins which was proposed by
online plant data have been incorporated to estimate the behavior Cortes and Vapnik (1995).
of a distillation column. Interest in SVM has increased over the last 15 years due to its
First principle mechanistic models have been developed for distinctive features and performance in solving regression and clas-
simulating a steady-state, multicomponent and multistage dis- sification problems. A detailed technical report on SVM for its use
tillation column. A rigorous method by Naphtali and Sandholm in classification and regression problems has been published by
(1971) has been further improved by Boston and Sullivan (1974), Gunn (1998) in 1998. Detailed tutorials on Support Vector Regres-
Hofeling and Seader (1978), Russell (1983), Kumar et al. (2001) and sion (SVR) were published by Smola and Schölkopf (1998, 2004) in
many others. Many researchers have further studied the use of rig- 1998 and 2004. A general purpose library has been distributed by
orous methods in optimization of distillation columns (Basak et al., Chang and Lin (2011) as LIBSVM that can be integrated to many
2002; More et al., 2010; Inamdar et al., 2004; Seo et al., 20 0 0). programming languages and data mining software.
Any rigorous model has mass and energy balance equations, SVM has also been used in modeling of chemical processes
need a comprehensive physical properties library and are very like polymerization (Lee et al., 2005), desulfurization (Shokri et al.,
complex. One needs an extensive knowledge of the method to be 2015), polymer extrusion (Chitralekha and Shah, 2010) and iso-
able to fit a model to an existing distillation column and the model merization (Li et al., 2009). Yao and Chu (2012) employed SVM in
usually deviates from the real data in time. The model is as good modeling and optimization of an atmospheric distillation column.
as the physical property estimation library selected for predicting Yan et al. (2004) developed a soft sensor for estimating the freez-
properties like phase equilibrium coefficients and enthalpies. ing point of light diesel in a distillation column. Lahiri and Ghanta
The enthalpies cannot be calculated with the assumption that (2009) employed SVR in estimating critical velocity of solid liquid
the components act like an ideal gas, since the molecules at- slurry flow.
tract and repulse each other, and have finite volume. Equations of Yao and Chu (2012) applied SVR for a real atmospheric dis-
state proposed by many academicians (Peng and Robinson, 1976; tillation column and optimized the SVR parameters using parti-
Redlich and Kwong, 1949; Soave, 1972) can be used to solve for cle swarm optimizer with linearly decreased inertia weight (Lin-
the phase equilibrium coefficients and excess enthalpies. Peng– WPSO), a version of a global optimizer proposed by Shi and Eber-
Robinson equation of state (PR) (Peng and Robinson, 1976) and hart (1998). Lahiri and Ghanta (2009) optimized the SVR param-
Soave modification of Redlich–Kwong (SRK) (Soave, 1972) are pop- eters using genetic algorithm (GA). Yao and Chu (2012) modeled
ularly used for predicting these properties of hydrocarbons. the atmospheric distillation column in a commercial chemical pro-
The crude oil has over 1 million different molecules in its mix- cess simulation tool, Aspen Plus and used data generated from case
ture where some are non-hydrocarbon molecules. Light hydrocar- studies. Since the training data contained simulation results, there
bons with as much as 6 or 7 carbon atoms may be defined as was neither noise nor outlier. Therefore, the SVR parameter ε to
pure components, but the rest of the molecules are grouped by be explained below (see Eqs. (2) and (3) and Fig. 1) was fixed to 0
true boiling point ranges as pseudo-components. To solve for phase (zero) which defeats the purpose of creating scarcity in the set of
equilibrium coefficients and enthalpies, one needs the properties of support vectors.
these components like critical temperature TC , critical pressure PC , The difference between SVM and the other regression methods
acentric factor ω. is that SVM minimizes the structural risk as opposed to other re-
Properties for pseudo-components like molecular weight, crit- gression methods that employed empirical risk minimization. Em-
ical temperature, critical pressure, acentric factor, heat of forma- pirical risk minimization only focuses on minimizing the error of
A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711 3

and the optimization problem for regression can be written as:


1
min w2
 2
(2)
y − f ( xi ) ≤ ε
s.t. i
f ( xi ) − yi ≤ ε.
This optimization problem will estimate all input and out-
put pairs within a tolerance of ε . However, this problem may
not always have a feasible solution, or we may want to tolerate
Fig. 1. Soft constraint loss function.
some degree of error. Thus the soft constrained loss function of
Bennett and Mangasarian (1992) was adapted into support vec-
tor machines by Cortes and Vapnik (1995). Slack variables, ξ i and
estimation which may end up with a model that describes the
noise or outliers and will have poor performance in predicting
ξi∗ and a scalar constant C was introduced to the optimization
problem. Slack variables are for the positive or the negative errors
unseen test data, whereas structural risk minimization focuses on
of estimation outside of the insensitive region. These slack vari-
training a function that is as flat as possible.
ables need to be minimized according to the user-defined positive
In the regression case, SVR has two main parameters ε and C
weight factor C, which determines the tradeoff between the flat-
that affect the balance between flatness versus error of estimation.
ness of the estimation function and the error tolerance. So, C is
Parameter ε describes the insensitive region in which the error of
a hyperparameter that control how much we penalize our use of
estimation has no cost, and parameter C describes the weight of er-
slack variables. The resulting optimization problem for regression
ror outside the insensitive region. When kernel functions are used,
can be written as:
one or more extra parameters may be introduced. A large value for
parameter C will make the algorithm put all training error within 1  l 
the insensitive region which will cause the SVR to memorize the min w2 + C ξi + ξi∗ ,
2
training data and generalize poorly to unseen test data. A small  i=1
(3)
y i − f ( x i ) ≤ ε + ξi ,
value for parameter ε will assume that there is no noise in the
s.t. f (xi ) − yi ≤ ε + ξi∗ ,
data and again will cause the SVR to overfit. In contrast a small
value for parameter C or a large value for parameter ε will cause
ξi , ξi∗ ≥ 0.
the SVR to underfit. Applying cross validation method in optimiz- Some kernels may introduce one or more additional parameters
ing these parameters will ensure the regression of a function that to be defined beforehand, but other than that the generalization
has great performance in prediction of unseen data. performance of the estimation function is greatly dependent on the
Suppose that we have a set of data parameters C and ε .
{(x1 , y1 ), (x2 , y2 ) . . . , (xl , yl )}, xi ∈ Rn , yi ∈ R, where xi can be A case for training a linear estimation function is shown in Fig.
a scalar or a vector containing the input values and yi is the 1, where the loss is |ξ | if the estimation error is ε + ξ , and the
corresponding output value. In this study, the data set will contain loss is 0 (zero) if the training data is in the grey region so that the
the control parameters of a distillation column as x, and one of estimation error is less than ε .
the hydrocarbon properties to be estimated as y. In Support Vector The optimization problem in Eq. (3) for support vector regres-
Regression, we want to estimate y with a function f(x) that has sion can be easily solved in its dual form. The dual problem can
least deviation from the data, and at the same time is as flat as be formed by the method of Lagrange multipliers which is a stan-
possible and has lowest model complexity. dard dualization method and explained by Fletcher and Sainz de la
The estimation function has the form, Maza, 1989).

f (x ) = w, x + b, w ∈ Rn , b ∈ R. (1) 2.1. The dual problem

where w is a combination of the support vectors that have been


First of all, the primal form is converted into a Lagrange func-
selected among the data set, ., . demotes the dot product, and
tion where the constraints of the primal problem are added to the
b is the bias term. The use of the dot product leads to a linear
objective function by the introduction of Lagrange multipliers for
estimation function, but it is possible to use kernel functions that
each of the constraints. The Lagrange function can be written as:
construct a mapping into a high dimensional feature space which
enables the linear regression of a non-linear estimation function. 1  l  l

A special loss function has been introduced to Support Vector fL = w2 + C ξi + ξi∗ − αi (ε + ξi − yi + w, xi  + b)
2
i=1 i=1
Regression similar to, but different than widely known loss func-
tions like Quadratic, Laplace or Huber functions. This loss func- 
l   l l 
 
tion is insensitive to errors less than an error tolerance of ε which − αi∗ ε + ξi∗ + yi − w, xi  − b − ( η i ξi ) − ηi∗ ξi∗ .
enables sparseness at the set of support vectors (Cortes and Vap- i=1 i=1 i=1

nik, 1995). This new loss function is named ε -insensitive function (4)
and will be further discussed together with the other mentioned The Lagrange multipliers are ηi , ηi∗ , αi , αi∗ and they are non-
loss functions. negative:
The first attempt at Support Vector Regression was with a hard
constraint where the goal was to estimate y with a function f(x) ηi , ηi∗ , αi , αi∗ ≥ 0. (5)
that has at most ε deviation from the training data and at the The Lagrange function has a saddle point where the partial
same time has lowest model complexity and is as flat as possible. derivatives of the Lagrange function with respect to the primal
This means any error lower than ε is neglected, while any error variables w, b, ξi , ξi∗ , are equal to 0 (zero) at the optimal point:
more than ε is not tolerated. For flatness and lower model com-
l  
plexity, one should pursue a small combination of support vectors, ∂L
w. To do this we can minimize the function w2 which is the dot =0=w− αi − αi∗ xi , (6)
∂w
product, w, w. The error of estimation will have a hard constraint i=1
4 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711

l   The dual problem is transformed into a minimization prob-


∂L
=0= αi − αi∗ , (7) lem:
∂b i=1 

1    
l l
min αi − αi∗ α j − α ∗j xi , x j
∂L 2
i=1
= 0 = C − αi − ηi ,
j=1
(8)
∂ ξi l 
  l  
+ε αi + αi∗ − yi αi − αi∗ ,
∂L ⎧ l
i=1 i=1
= 0 = C − αi∗ − ηi∗ . (9)
∂ξi∗ ⎨  
st.
αi − αi∗ = 0, (15)
The optimality conditions above helps us simplify the dual ⎩ i=1
0 ≤ αi , αi∗ ≤ C.
problem to be in terms of dual variables which are the Lagrange
multipliers. In this case, the Eqs. (8) and (9) can be used to elim- We need to solve for the optimum values for the Lagrange
inate the Lagrange multipliers ηi , ηi∗ by rearranging the equations multipliers,α i and αi∗ . This study uses βi = αi − αi∗ , and βi∗ = αi +
as: αi∗ as decision variables in the quadratic programming to create
sparseness in the H matrix for faster convergence. The size of de-
ηi = C − αi , cision variable vector is 2l, two times the number of training sam-
ηi∗ = C − αi∗ . (10) ples. The problem is reformulated as,


Since the Lagrange multipliers have to be nonnegative, the La- 1 


l l  
l 
l

grange multipliers αi , αi∗ cannot be larger than C to comply with


min βi β j xi , x j +ε βi∗ − yi βi , (16)
2
i=1 j=1 i=1 i=1
Eq. (10). Eq. (7) becomes a constraint for the dual problem. We
⎧ l
also see that the Eq. (6) can be rearranged to get the term w as a ⎪
⎪ βi = 0,

combination of the input set of the training data: ⎨
i=1
l  st. 0 ≤ (β + β ∗ ), (β ∗ − β ) ≤ 2C, (16)
  ⎪
⎪ i i i i
w= αi − αi∗ xi . (11) ⎪
⎩−C ≤ βi ≤ C,
i=1 0 ≤ βi∗ ≤ 2C.
By substituting Eqs. (11) into (1), we can rewrite the estimation First, we need to calculate the dot product matrix M which is
function as: of the size l x l as below for linear functions,
⎡ ⎤
l 
  x1 , x1  x1 , x2  ··· x1 , xl 
f (x ) = αi − α xi , x + b.

(12) ⎢x2 , x1  x2 , x2  ··· x2 , xl ⎥
Mlxl = ⎢ .. ⎥
i
i=1
⎣ ... ..
.
..
. .
⎦. (17)

By substituting Eqs. (10) and (11) into Eq. (4) with constraints xl , x1  xl , x2  ··· xl , xl 
deduced from Eqs. (7) and (10), the primal variables w, b, ξi , ξi∗
are eliminated and we get the dual problem as: Then, the H matrix of the quadratic programming is written as,

⎡ ⎤
M 0 ··· 0
1    
l l
max − αi − αi∗ α j − α ∗j xi , x j ⎢ .. ⎥
2 ⎢0 0 .⎥
H2lx2l = ⎢
.. ⎥
i=1 j=1 , (18)
l 
⎣ .. .. ⎦
  
l   . . .
−ε αi + αi + ∗
yi αi − αi ,∗
0 ··· ··· 0
i=1 i=1
⎧ l and the f matrix is
⎨   ⎡ ⎤
st.
αi − αi∗ = 0, (13) −y1
⎩ i=1 ⎢ .. ⎥
0 ≤ αi , αi∗ ≤ C. ⎢ . ⎥
⎢ ⎥
⎢ −y ⎥
Now that the optimization problem does not include w, support f2lx1 = ⎢ l ⎥. (19)
⎢ ε ⎥
vector regression problems especially with nonlinear feature space ⎢ .. ⎥
transformations can be solved independently of the dimension of ⎣ . ⎦
the feature space with quadratic programming. ε
Then the matrixes A, Aeq and vectors b, beq are formed accord-
2.2. Training with quadratic programming ing to the constraints in Eq. (16).

Quadratic programming minimizes a quadratic objective func- 2.3. The support vectors
tion with decision variables subject to linear equality and inequal-
ity constraints. The Karush–Kuhn–Tucker (KKT) conditions or complementary
slackness conditions state that at the optimum point, the product
1 T
min x Hx + f T x, r between dual variables and primal constraints should be equal to
2 zero:
Ax ≤ b,
s.t. Aeq x = beq , (14) αi (ε + ξi − yi + w, xi  + b) = 0,
 
lb ≤ x ≤ ub. αi∗ ε + ξi∗ + yi − w, xi  − b = 0. (20)
A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711 5

Also, the product of primal variables and dual constraints The Eq. (11) for term w is expanded to:
should be equal to zero:
l 
 
ξi (C − αi ) = 0, w= αi − αi∗ φ (xi ). (28)
 
ξi∗ C − αi∗ = 0. (21) i=1

From the Eq. (20), we can see thatαi αi∗ = 0, meaning at least The matrix M in quadratic programming is rewritten as:
one of α i or αi∗ should be equal to zero. From Eq. (21), train- ⎡ ⎤
K ( x1 , x1 ) K ( x1 , x2 ) ··· K ( x1 , xl )
ing data with αi = C or αi∗ = C can have positive primal slack
variablesξ i or ξi∗ , meaning the training data can be outside of the
⎢K (x2 , x1 ) K ( x2 , x2 ) ··· K (x2 , xl )⎥
Mlxl =⎢ ⎥. (29)
ε-insensitive zone if and only if α i or αi∗ is equal to C. ⎣ ... ... ..
. ... ⎦
Remember that in Eq. (11), the term w is expanded into a K ( xl , x1 ) K ( xl , x2 ) ··· K ( xl , xl )
combination of the input vectors. From Eq (20), when the esti-
mation error, | f (xi ) − yi | is lower than ε , both Lagrange multi- Finally, the estimation function in Eq. (22) is rewritten as:
pliers α i and αi∗ should be equal to zero, since the slack vari-

S
ables have to be nonnegative. Therefore, (αi − αi∗ ) vanishes in Eq. f (x ) = (βs K (xs , x )) + b, (30)
(11) which creates scarcity in the expansion of w. As a conclu- s=1
sion, the training data with estimation error, | f (xi ) − yi | ≥ ε have
one of the Lagrange multipliers with a nonzero value which makes where the Eq. (30) for bias term is revised to:
them Support Vectors. From now on the support vectors are xs for


1  
SS S
s = 1, 2, . . . , S ≤ l, and have the coefficient βs = (αs − αs∗ ). The esti- b= yss − εσ − βss K (xss , xs ) . (31)
mation function is rewritten as: SS
ss=1 s=1

S
f (x ) = (βs xs , x ) + b. (22) It should be noted that the mapping into high dimensional fea-
s=1 ture space leads to curse of dimensionality, that is, as dimensional-
Although we expect one or both of the Lagrange multipliers ity increases, the volume of space increases that the training data
equal to zero at optimal point, quadratic programming solves them becomes sparse in this high dimensional space. To overcome this
to be very small numbers close to zero, so support vectors are se- problem, a large training set with an even data distribution should
lected by the relation: be provided to SVR (Gunn, 1998).
There are various types of kernels that can be used in SVR and
(αi − αi∗ ) > 10−6C. (23) combinations of these kernels are possible. Nonlinear kernels may
have parameters that are predefined by the user like the main pa-
2.4. The bias term
rameters of the SVR, ε and C.
We will find the bias from the average of support vectors with
estimation error equal to ε . These support vectors are xss and are 2.6. Linear kernel
identified from xs with 0 < |β s | < C. To filter out very small num-
bers the relation is updated to: This is actually the dot product of the vectors and no mapping
is applied. Since the SVR will be coded with a kernel function, lin-
10 C < |βs | < (1 − 10
−6 −6
)C. (24)
ear kernel will be among choices.
Finally, the bias term is calculated by: 

K ( xi , x j ) = xi , x j . (32)
1 
SS 
S
b= yss − εσ − βss xss , xs  , (25)
SS 2.7. Polynomial kernel
ss=1 s=1

where σ is equal to 1 if β ss > 0 or −1 otherwise.


Polynomial kernel is among the most popular choices for non-
2.5. Kernel functions linear modelling and is formulated as:
 d
Kernel functions are used to construct a mapping into a high K ( xi , x j ) = xi , x j + 1 , (33)
dimensional feature space which enables the linear regression of where d is the kernel parameter for the degree of the polynomial.
a non-linear function, meaning that mapping is preprocessed and The addition of 1 (one) prevents the Hessian from becoming zero.
SVR is repurposed to find the flattest function in this feature space.
The input data xi is mapped into φ (xi ), but we do not need to
define φ (xi ) explicitly since SVR only needs the dot product in the 2.8. Gaussian radial basis function kernel
regression procedure and in the final estimation function. Hence,
we may define and use the kernel function as: Gaussian radial basis function (RBF), which uses the square of
 the Euclidean distance between the two vectors, is also widely
K ( xi , x j ) = φ ( xi ), φ ( x j ) . (26) used in SVM for its flexibility. The Gaussian RBF kernel is defined
In this case, we may rewrite the SVR problem in Eq. (15) as: as:

  

1 
l l 
l 
l
xi − x j 2
min βi β j K (xi , x j ) +ε β −

i yi βi , K (xi , x j ) = exp − . (34)
2
i=1 j=1 i=1 i=1 2σ 2
⎧ l
⎪
⎪ βi = 0,

This function can be simplified to:
⎨   2 
K (xi , x j ) = exp −γ xi − x j 
i=1
st. 0 ≤ (β + β ∗ ), (β ∗ − β ) ≤ 2C, (27) , (35)

⎪ i i i i

⎩−C ≤ βi ≤ C,
0 ≤ βi∗ ≤ 2C. where γ = 1/2σ 2 , γ ∈ R, γ > 0 is the kernel parameter.
6 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711

2.9. Exponential radial basis function kernel

Exponential RBF, which employs the first-degree Euclidean dis-


tance between the two vectors, generates a piecewise linear solu-
tion and is defined as:
  
K (xi , x j ) = exp −γ xi − x j  , (36)

where γ = 1/2σ 2 , γ ∈ R, γ > 0 is the kernel parameter.

2.10. Sum of kernels

More complex kernels can be formed by summation of two or


more kernels as:

K ( xi , x j ) = Kk (xi , x j ). (37) Fig. 3. Scaling methods.
k

2.11. Product of kernels Huber loss function in Fig. 2(c) is a combination of the
Quadratic and Laplace loss functions. The function is quadratic for
Similarly, kernels can be formed by taking products of two or small error values and linear for large error values. The functions
more kernels as: and their slopes are equal at the intersections of these two func-
 tions. This property combines the sensitivity of the Quadratic loss
K ( xi , x j ) = Kk (xi , x j ). (38)
function and the robustness of the Laplace loss function. The Hu-
k
ber Loss Function has the form:
1
2.12. Loss functions ( f (x ) − y )2 for| f (x ) − y| < μ,
L ( f (x ) − y ) = 2
(39)
μ| f (x ) − y| − μ2 otherwise.
2

In this section, we examine four types of loss function which


penalizes the distance between the estimation function and the
2.13. ε -Insensitive loss function
training data (see Fig. 2 for illustration of these functions). Any
of the loss function mentioned in this section can be integrated
The mentioned three loss functions will show no sparseness
into SVR. However, the loss function with best generalization per-
if used in support vector regression, since any error even small
formance and that creates sparseness in the set of support vectors
creates a cost. Cortes and Vapnik (1995) has modified Huber loss
has been employed in this paper.
function to ε -insensitive loss function in Fig. 3(d) which has an in-
Quadratic loss function in Fig. 2(a) is actually the squared error
sensitive region at small error values, leading to a sparseness at the
of estimation and is commonly used for regression. The quadratic
set of support vectors. With a large training set, sparseness greatly
loss function is sensitive to outliers, leading to a sway from the
reduces computational time. This function has parameter ε which
mean by heavy-tailed distributions.
controls the width of the insensitive region. When this parameter
On the other hand, the Laplace loss function in Fig. 2(b) is the
is equal to zero, we have the Laplace loss function. This paper fo-
absolute value of the estimation error, which is less sensitive to
cuses on ε -insensitive loss function as it is mainly used by support
outliers. However, Laplace loss function cannot be differentiated at
vector machines.
error equal to zero.
3. Data normalization

The rate of convergence in training the support vector model is


greatly affected by the eigenvalues of the Hessian. We can improve
this rate by decreasing the difference between the smallest and
largest eigenvalue. Therefore, this study strongly recommends and
applies normalization for input data to improve the rate of conver-
gence. Some kernel functions can only be defined in restricted in-
tervals which already require data normalization. For unrestricted
kernels, this study scales data between the interval [−1, 1].
Isotropic or anisotropic scaling can be applied for data nor-
malization. With anisotropic scaling, each input variable is sepa-
rately scaled between the lower bound and the upper bound. With
isotropic scaling, all of the input variables are shrunk into the
bounds by the same scaling factor. Fig. 3 illustrates the scaling of
two data ranges by these methods. Isotropic scaling method may
be adopted when all input variables are of the same unit. This pa-
per applies anisotropic scaling, because the input variables contain
temperature and pressure readings.

3.1. Optimizing SVR parameters

The ε -insensitive loss function, a user-defined parameter ε


that describes the width of the insensitive range (Smola and
Fig. 2. Loss functions. Schölkopf, 2004). The value of the parameter ε directly affects
A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711 7

the number of support vectors in the estimation function. A small Table 1


Tuned parameters of genetic algorithm.
value will increase model complexity and cause the SVR to select
more support vectors, memorize the training data, and have poor Parameter Description
generalization performance while a larger value will lead to fewer P Population size
support vectors and a flatter estimation function, but then fail to Gmax Generation limit
estimate the effect of an input variable on the output variable. sE Elite count
Soft constrained loss function was incorporated into SVM by rC Cross-over ratio

Cortes and Vapnik (1995). It introduces the SVR parameter C which


handles the problem of outliers by creating a cost to estimation er-
rors larger than ε . A large value will increase the model complex- 3.5. Adaptive grid search algorithm
ity and cause the SVR to tolerate less to deviations larger than ε
while a small value will lead to a flatter estimation function. Kernel Grid search (GS) method is another tool popularly used for op-
functions were incorporated into SVM by Boser et al. (1992) and timizing SVR parameter. This method applies a mesh on the search
some kernels may have user-defined parameters. The user-defined space and re-meshes around the best-known solutions until the
parameters of SVR can be optimized by cross validation methods best solution improves under a given tolerance or the mesh size
integrated into global optimization tools where a cross validation is under a given tolerance.
is applied at each evaluation called by the optimization solver. The performance of the grid search methods highly depends on
the resolution of the mesh and search method. The grid search
3.2. Cross-validation method may miss a good solution due to low resolution of the
mesh. However, a high resolution will have high computational
Cross validation methods are used for partitioning the data set cost since each edge of the mesh should be evaluated.
into training and testing sets. The training set is used in training
the estimation function while testing set is used for monitoring
the performance of generalization of the estimation function. There 4. Estimating physical properties
are many kinds of cross validation methods and this study focuses
on k-fold cross validation method, because it is a non-exhaustive 4.1. The data
method with reasonable computational cost.
k-fold cross validation method shuffles and partitions the data The atmospheric distillation column (ADC) subject to this study
set into k equally sized subsets. One of the subsets is selected as is an actual operating unit in İzmit Refinery of Turkish Petroleum
the testing set and the rest of the subsets are used as training set. Refineries Corporation. The ADC together with its pump-arounds,
The cross-validation process is repeated k times thus naming the side-strippers and condenser drum is depicted in Fig. 4.
method k-fold, where each subset is used exactly once for testing. The crude distillation unit as a total was designed for process-
The squared error from each iteration is averaged to give a single ing crude oil from Kirkuk, Iraq, but currently processes different
mean squared error (MSE). The parameter k is user-defined and kinds of crude oils. A pre-flash column separates LPG and Light
a high value for k will increase the performance of generalization Naphtha and sends the remaining stream to the ADC. Then, the
while also increasing computational cost. ADS separate the feed into Naphtha, Heavy Naphtha, Kerosene,
Light and Heavy Diesel and Atmospheric Bottom.
3.3. Optimization

Previous studies show that optimization tools like Genetic Al-


gorithm or Particle Swarm Optimizer are used for optimizing SVR
parameters in modeling chemical processes. Grid search or con-
strained nonlinear programming may also be used for this purpose.

3.4. Genetic algorithm

Genetic algorithm (GA) is a heuristic optimization method, be-


longing to the class of evolutionary algorithms. This method mim-
ics the process of natural selection where a population of candi-
date solutions evolve through generations to a better solution.
The initial population is usually generated randomly at a user-
defined size and at each generation, a predefined ratio of the pop-
ulation survives to see the next generation, also called the elites.
Now the elites will breed via crossover and mutation at predefined
ratios to complete the size of the population to form the next gen-
eration. In crossover process, elites exchange their variable values
to create their children. In mutation process, the variable values of
the children mutate into new values. The iteration is looped until
the problem does not improve or a pre-defined generation limit is
reached. Table 1 gives the critical parameters of GA that are needed
to be tuned for a better performance. The ratios for elite selection,
crossover and mutation should be tuned so that the algorithm can
evolve to a better solution but without losing good solutions. Pop-
ulation size and generation limit should also be tuned, so that a
moderate amount of evaluation is conducted at an acceptable com-
putational cost. Fig. 4. Atmospheric distillation column.
8 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711

Table 2 Table 3
Critical hydrocarbon properties. Variables used for regression.

Product Property Type Variable

Light naphtha Distillation curve 5% point Output Heavy diesel 95% point
Light naphtha Distillation curve 95% point Input Flash zone temperature
Heavy naphtha Flash point Input Flash zone pressure
Kerosene Flash point Input Heavy diesel tray temperature
Kerosene Distillation curve 95% point
Light diesel Distillation curve 95% point
Heavy diesel Distillation curve 95% point
fication value or valuable fraction of heavy diesel is lost to atmo-
spheric bottom. This property can be estimated by the flash zone
temperature and pressure and the heavy diesel tray temperature.
There are critical hydrocarbon properties for the products of the
Although these variables are not the only inputs which affect the
atmospheric distillation column as given in Table 2 and these prop-
product quality, they are the most obvious choices. Increasing or
erties are optimized daily by the Planning Department according to
decreasing number of inputs will significantly affect the perfor-
changing prices and demand. The plant operators receive the opti-
mance. We will look in to feature selection and generation issue
mal values and monitor these properties together with online plant
separately in the future research. The variables used in SVR and
measurements like temperature, pressure and flowrates. There are
current quality estimator are given in Table 3.
no online analyzers for the measurement of the critical hydrocar-
Online plant data as well as laboratory analysis results can be
bon properties, but instead a sample from each of the streams is
reached from process historical database which has years of data
collected three times a day and is analyzed at the laboratory. There
stored. The laboratory analysis data was gathered as raw data since
are also robust quality estimators which are nonlinear regression
it is stored three times a day, while online plant data was gath-
functions employed for online prediction of the hydrocarbon prop-
ered as the average of thirty minutes of raw data for smoothening
erties.
noise. A sample is gathered and analyzed only when the ADC is in
The distillation curve in Fig. 5 gives a plot of temperature ver-
stable operation condition and the corresponding online measure-
sus fraction of the liquid boiled at that temperature. The most crit-
ments with thirty minutes period ending with the timestamp of
ical properties are the temperature at which 5% and 95% of the
the sample were gathered.
sample boils. The distillation column cannot perfectly separate the
products, so the lighter end of the heavy product can be seen in
the lighter product and via versa. The transitioning boiling point 4.2. The procedure
temperature between two products is called the cut point. The cut
point can be controlled by manipulating the flowrate of a side- The algorithm for normalizing and splitting data, SVR train-
drawn product. If the flowrate of the lighter product is increased, ing and SVR parameter optimization was coded in MATLAB®.
more of the heavier product moves up the tray of the light product, Quadratic Programming tool of MATLAB® was employed in SVR
leading to an increase in the tray temperature and the 95%-point training and Cross Validation, Genetic Algorithm, Pattern Search
temperature of the light product. As a result, the production rate and Constrained Nonlinear Programming tools of MATLAB® were
of a product can be maximized within specification limits for the employed in SVR parameter optimization.
5%- and 95%-point temperatures. We have 413 sample sets dating from the beginning of the year
This study focuses on estimation of the 95%-point temperature 2019 until the start of June 2019. The input data is normalized be-
of heavy diesel since this property should be at maximum speci- tween −1 and 1 by anisotropic scaling. Then the dataset is split

Fig. 5. Distillation curve of diesel.


A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711 9

Table 4 Table 6
Defined bounds for SVR parameters. Defined parameters for nonlinear programming.

Parameter Initial value Lower bound Upper bound Parameter Value

Cost of error, C 1E−03 1E−03 1000 Iteration limit 300


Width of insensitive region, ɛ 1E−03 1E−03 10 Algorithm interior point
Polynomial degree, d 1 1 10
Radial basis function gain, γ 1E−06 1E−06 1000
Table 7
Defined parameters for cross validation.
Table 4a Parameter Value
Defined parameters for genetic algorithm.
Partitioning method k-fold
Parameter Value Partition count 5
Population size, P 20
Generation limit, Gmax 20
Table 8
Elite count, sE 4
Comparison of linear regression against SVR.
Cross-over ratio, rC 0.6
Creation function Uniform Linear reg. Best SVR
Selection function Tournament
Error Mean 4.01 3.3 GS-Gaus.
Tournament count 4
Std dev. 0.65 0.6 GS-Gaus.
Cross-over function Scattered
Max 12.7 18.9 GS-Gaus.
Mutation function Adaptive feasible

with each subset treated once as a training set inside the optimiza-
into training and testing sets with a 15% testing ratio. The sizes of
tion solver.
training and testing sets are 352 and 61 respectively. The splitting
When optimization ends, the total training set is trained for the
ratio is decided based on the size of the dataset. The general rule
last time with the optimized SVR parameters and then the testing
of thumb is that 65 to 85 % for training to better model the un-
data is used for monitoring the performance of SVR.
derlying distribution, and then test the results with the remaining.
Although this work focused on the hypertuning of SVR, amount of 5. Results and discussion
training data is not selected as decision variable. Data splitting is
accomplished by first partitioning the dataset into segments of 15% 5.1. Optimization results
ratio and then by splitting each segment into training and testing
sets of 85% and 15% ratios respectively. The sizes of the segments The use of βi = αi − αi∗ , and βi∗ = αi + αi∗ as decision variables
and their training and testing sub-segments may differ by only one in QP greatly improved the rate of convergence. The CPU time
sample set. of SVR training with α i and αi∗ as decision variables is approxi-
The kernel is selected for the SVR. This study employed Linear, mately 1.9 seconds while the CPU time of SVR training with the
Polynomial and Gaussian Radial Basis Function kernels. The initial simplified variables was reduced by 80% to a range between 350
values, upper and lower bounds for the main SVR parameters and and 400 ms even though the problem size is the same. This im-
the kernel parameters are supplied to the solvers for SVR param- provement is achieved by halving the number of decision variables
eter optimization purposes and are given in Table 4. It should be in the second-degree part of the cost function, which creates a
noted that initial values are only supplied to Grid Search and Non- sparseness in the H matrix.
linear Programming. SVR has two important parameters ε and C The estimation performance of linear regression is given in
that affect the balance between flatness versus error of estima- Table 8 together with a comparison with the best of SVR. The esti-
tion. Parameter ε defines a margin of tolerance where no penalty mation of linear regression is compared with the laboratory analy-
is given to errors, and parameter C describes how much we penal- sis data in Fig. 6.
ize slack variables. Genetic algorithm optimization results are given in Table 9 to-
Then the solver is selected for the optimization of SVR parame- gether with a comparison against quality estimator and the best.
ters. This study has employed Genetic Algorithm, Grid Search and The table also includes the optimized SVR parameters, the number
Nonlinear Programming. Population size is selected based trial and of support vectors and the CPU time. SVR estimations with Gaus-
error. The use of smaller population resulted in lower accuracy but sian RBF kernels against laboratory analysis data are given in Fig.
increase in population size only increased computational resource, 7.
not the accuracy. The solver parameters are given in Tables 4a, 5 The results for optimization of SVR parameters by Grid Search
and 6 respectively. method are given in Table 10 and plots for the estimation with
The cross validation embedded inside the optimization solvers Gaussian RBF kernels versus laboratory analysis data are given in
uses k-fold partitioning and the partition count is set to 5 as given Fig. 8.
in Table 7.
Only training set is supplied to parameter optimization and 6. Discussion of optimization results
cross validation further partitions the training set into 5 subsets
The results show that different optimization solvers have found
different local optimal points with similar mean absolute errors
Table 5
Defined parameters for adaptive grid search. and standard deviations for the same kernels, meaning that there
are many similar local optima. Genetic method has triumphed in
Parameter Value
most of the criteria. SVR with Gaussian RBF kernel optimized by
Iteration 5 Grid Search has best performance in testing. Grid Search method
C bound 1–1000 has highest computational.
Epsilon bound 0.001–1
Sigma bound 0.1–100
It should be noted that any solution linear regression also
showed quite bit good performance. Despite Gaussian kernel with
10 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711

Fig. 6. Plot linear regression estimation vs. laboratory analysis test data.

Fig. 7. Plot of SVR with genetic algorithm and Gaussian RBF kernel.

Table 9
Results for SVR optimized with genetic algorithm.

Linear kernel Polynomial kernel Gaussian kernel Linear regression

Mean abs error 4.0 4.03 3.8 4.01


Std deviation 0.66 0.65 0.6 0.65
Max error 12.30 11.12 12.3 12.7
C best 116 2.3 1.18
Epsilon best 0.03 0.3 0.4
Sigma best 74 36.5 0.2
A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711 11

Table 10
Results for SVR optimized with grid search.

Linear kernel Polynomial kernel Gaussian kernel Linear regression

Mean abs error 3.99 4.06 3.99 4.01


Std deviation 0.66 0.68 0.65 0.65
Max error 12.44 12.97 10.64 12.7
C best 40 1 1300
Epsilon best 0.0001 0.001 0.0001
Sigma best 0.1 0.1 9.2

Fig. 8. Plot of SVR with grid search and Gaussian RBF kernel.

GA tuned, difference between others is said to be negligible. The testing results show that SVR with any of the integrated
The overall results show that Grid Search resolution should be kernels performed well in estimating the property and generalized
increased to increase its accuracy. Otherwise, genetic algorithm well to unseen testing data. The mean absolute testing error of
should be used to tune parameters in this case. linear regression method is improved by 5% with SVR from 4 °C
to 3.8 °C. SVR with Gaussian kernel optimized by Genetic Algo-
7. Conclusions rithm has shown top performance in nearly all criteria. In order
to understand whether this decrease in error is significant or not,
In this paper, Support Vector Regression (SVR) was employed we need to know measurement uncertainty of laboratory. Unfortu-
for estimating 95% Boiling Point Temperature property of Heavy nately, measurement uncertainty of TÜPRAŞ lab cannot be shared
Diesel product of a real operating Atmospheric Distillation Column with readers because of the confidentiality. Nevertheless, it must
(ADC) in İzmit Refinery of Turkish Petroleum Refineries Corpora- be noted that this decreased in reduction can be further decreased
tion. Linear, Polynomial and Gaussian Radial Basis Function (Gaus- by adding data preprocessing and feature selection methods.
sian RBF) kernels were tested and the SVR parameters were opti- Yet a recent subject in Machine Learning, modified versions of
mized by embedding k-fold cross validation in an optimizer such SVRs can be implemented for improving the SVR parameter opti-
as Genetic Algorithm (GA) and Grid Search (GS). The performance mization process. One of the promising versions is the v-Support
of SVR was compared against linear regression method already Vector Regression where the width of the insensitive region is op-
functioning in the ADC. timized within the training process by fixing the fraction of sup-
The SVR parameters were optimized in a time period ranging port vectors in a training set. The fraction will be a real number
from 5 min to 10 min, but once the parameters are optimized, the between 0 and 1 and can be optimized in the SVR parameter opti-
SVR can be trained with the optimum parameters in approximately mization step.
350 ms. As a result, the SVR model can be updated with incoming Also, there may be other known column dynamics correlating
laboratory analysis data within a second while the SVR parameters with the output parameter. The online measurements incorporated
can be updated periodically in monthly intervals. in the linear regression method and in this study were selected by
The rate of convergence of SVR training is reduced by 80% ap- experts of the process, but a principle component analysis (PCA)
proximately from 1.9 s to 350 ms by incorporating simplified deci- can be conducted to see the effects of other column measure-
sion variables instead of using the Lagrange multipliers as decision ments. We will look into feature selection and generation issue
variables. This improvement is achieved by creating sparseness in separately in the future research.
the second-degree part of the cost function.
12 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711

Finally, in this work only heavy diesel 95%-point is focused and Hofeling, B.S., Seader, J.D., 1978. A modified Naphtali-Sandholm method for general
estimated. However, there are many quality parameters to be pre- systems of interlinked, multistaged separators. AIChE J. 24 (6), 1131–1134.
Inamdar, S.V., Gupta, S.K., Saraf, D.N., 2004. Multi-objective optimization of an in-
dicted in refinery process such as kerosene, naphtha 95%-points. dustrial crude distillation unit using the elitist non-dominated sorting genetic
We believe that very similar improvements can be achieved by us- algorithm. Chem. Eng. Res. Des. 82 (5), 611–623.
ing same approach. Khoury, F.M., 2014. Multistage Separation Processes, Fourth ed. Taylor & Francis.
Kumar, V., et al., 2001. A crude distillation unit model suitable for online applica-
tions. Fuel Process. Technol. 73 (1), 1–21.
Declaration of Competing Interest Lahiri, S.K., Ghanta, K.C., 2009. Hybrid support vector regression and genetic algo-
rithm technique – a novel approach in process modeling. Chem. Prod. Process
Model. 4 (1), 14–23.
None.
Lee, D.E., et al., 2005. Weighted support vector machine for quality estimation in
the polymerization process. Ind. Eng. Chem. Res. 44 (7), 2101–2105.
CRediT authorship contribution statement Leffler, W.L., Petroleum refining for the nontechnical person. 2nd ed, Pennwell Corp.,
1985.
Li, L., Su, H., Chu, J., 2009. Modeling of isomerization of C8 aromatics by online least
Ahmet Can Serfidan: Investigation, Data curation, Software, squares support vector machine. Chin. J. Chem. Eng. 17 (3), 437–444.
Validation, Visualization, Writing - review & editing. Firat Uzman: More, R.K., et al., 2010. Optimization of crude distillation system using aspen plus:
Conceptualization, Data curation, Formal analysis, Validation, Writ- effect of binary feed selection on grass-root design. Chem. Eng. Res. Des. 88 (2),
121–134.
ing - original draft. Metin Türkay: Conceptualization, Formal anal- Naphtali, L.M., Sandholm, D.P., 1971. Multicomponent separation calculations by lin-
ysis, Investigation, Methodology, Supervision, Writing - review & earization. AIChE J. 17 (1), 148–153.
editing. Peng, D.-Y., Robinson, D.B., 1976. A new two-constant equation of state. Ind. Eng.
Chem. Fundam. 15 (1), 59–64.
Redlich, O., Kwong, J., 1949. On the thermodynamics of solutions. V. An equation of
References state. Fugacities of gaseous solutions. Chem. Rev. 44 (1), 233–244.
Russell, R.A., 1983. A flexible and reliable method solves single-tower and crude
Aizerman, A., Braverman, E.M., Rozoner, L., 1964. Theoretical foundations of the po- distillation-column problems. Chem. Eng. 90 (21), 52–59.
tential function method in pattern recognition learning. Autom. Remote Control Seo, J.W., Oh, M., Lee, T.H., 20 0 0. Design optimization of a crude oil distillation pro-
25, 821–837. cess. Chem. Eng. Technol. 23 (2), 157–164.
API, 2008. Basic Petroleum Data Book, 28. American Petroleum Institute. Shi, Y., Eberhart, R., 1998. A modified particle swarm optimizer. In: Proceedings of
Basak, K., et al., 2002. On-line optimization of a crude distillation unit with con- the 1998 IEEE International Conference on Evolutionary Computation proceed-
straints on product properties. Ind. Eng. Chem. Res. 41 (6), 1557–1568. ings: IEEE World Congress on Computational Intelligence. IEEE.
Bennett, K.P., Mangasarian, O.L., 1992. Robust linear programming discrimination of Shokri, S., et al., 2015. Soft sensor design for hydrodesulfurization process using
two linearly inseparable sets. Optimiz. Methods Softw. 1 (1), 23–34. support vector regression based on WT and PCA. J. Cent. South Univ. 22 (2),
Boser, B.E., Guyon, I.M., Vapnik, V.N., 1992. A training algorithm for optimal mar- 511–521.
gin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Smola, A., Schölkopf, B., 2004. A tutorial on support vector regression. Stat. Comput.
Learning Theory, Pittsburgh, Pennsylvania, USA. ACM, pp. 144–152. 14 (3), 199–222.
Boston, J.F., Sullivan, S.L., 1974. A new class of solution methods for multicompo- Smola, A.J. and B. Schölkopf, A tutorial on support vector regression, in NeuroCOLT2
nent, multistage separation processes. Can. J. Chem. Eng. 52 (1), 52–63. Technical Report Series. 1998.
Chang, C.-C., Lin, C.-J., 2011. LIBSVM: a library for support vector machines. ACM Soave, G., 1972. Equilibrium constants from a modified Redlich–Kwong equation of
Trans. Intell. Syst. Technol. 2 (3), 27. state. Chem. Eng. Sci. 27 (6), 1197–1203.
Chitralekha, S.B., Shah, S.L., 2010. Application of support vector regression for devel- Vapnik, V.N., Chervonenkis, A.Y., 1968. On the uniform convergence of relative fre-
oping soft sensors for nonlinear processes. Can. J. Chem. Eng. 88 (5), 696–709. quencies of events to their probabilities. Dokl Akad Nauk SSSR 181 (4), 781–783.
Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20 (3), 273–297. Vapnik, V., Lerner, A., 1963. Generalized portrait method for pattern recognition.
Fletcher, R., Sainz de la Maza, E., 1989. Nonlinear programming and nonsmooth Autom. Remote Control 24 (6), 774–780.
optimization by successive linear programming. Math. Program. 43 (1–3), Yan, W., Shao, H., Wang, X., 2004. Soft sensing modeling based on support vector
235–256. machine and Bayesian model selection. Comput. Chem. Eng. 28 (8), 1489–1498.
Gary, J.H., Handwerk, G.E., Kaiser, M.J., 2010. Petroleum Refining: Technology and Yao, H., Chu, J., 2012. Operational optimization of a simulated atmospheric distilla-
Economics. CRC Press. tion column using support vector regression models and information analysis.
Gunn, S.R., Support vector machines for classification and regression. ISIS Technical Re- Chem. Eng. Res. Des. 90 (12), 2247–2261.
port, 1998.14.

You might also like