Serfidan 2020
Serfidan 2020
Serfidan 2020
a r t i c l e i n f o a b s t r a c t
Article history: Atmospheric distillation column is one of the most important units in an oil refinery where crude oil
Received 15 June 2019 is fractioned into its more valuable constituents. Almost all of the state-of-the art online equipment has
Revised 30 November 2019
a time lag to complete the physical property analysis in real time due to complexity of the analyses.
Accepted 27 December 2019
Therefore, estimation of the physical properties from online plant data with a soft sensor has significant
Available online 2 January 2020
benefits. In this paper, we estimate the physical properties of the hydrocarbon products of an atmo-
Keywords: spheric distillation column by support vector regression using Linear, Polynomial and Gaussian Radial
Data analytics Basis Function kernels and SVR parameters are optimized by using a variety of algorithms including ge-
Optimization netic algorithm, grid search and non-linear programming. The optimization-based data analytics approach
Parameter estimation is shown to produce superior results compared to linear regression, the mean testing error of estimation
Support vector regression is improved by 5% with SVR 4.01 °C to 3.8 °C.
Atmospheric distillation
© 2020 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.compchemeng.2019.106711
0098-1354/© 2020 Elsevier Ltd. All rights reserved.
2 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711
More recent studies have utilized Support Vector Machines tion, ideal gas enthalpy can be predicted by empirical correlations
(SVM) in regression problems. SVMs, in contrast to ANN embod- which are documented in the Petroleum Data Book of the Ameri-
ies Structural Risk Minimization that aims to generate a flatter and can Petroleum Institute (API, 2008).
less complex function. By implementing a new loss function, Sup- A commonly used method in many chemical process simulators
port Vector Regression (SVR) chooses the flattest path where the is the inside-out algorithm by Russell (1983), which is explained in
error is kept within a predefined width for insensitive region and a Khoury (2014). This method simplifies the phase equilibrium co-
predefined cost factor handles outlier data. Moreover, by employ- efficient and enthalpy calculations which are computationally ex-
ing kernel functions, SVR can be trained to generate a nonlinear pensive, by linearizing the function in terms of the stage tempera-
estimation function. Lastly, the user-defined parameters of SVR can ture. The inner loop solves the model according to the performance
be optimized via cross validation embedded in global optimizers specifications given by the user with the simplified functions. Then
like Genetic Algorithm or simpler solvers like Grid Search, to max- the outer loop updates the phase equilibrium coefficients, and en-
imize generalization performance. thalpy calculations, and checks if the update is within the specified
limits. The algorithm terminates if the update is within the defined
2. Literature survey limits. If it is not, the functions are linearized again for another set
of inner loop iterations. This method requires little information for
Since Crude Distillation Units are complex and energy intensive generating initial estimates and has lower computational cost.
units, safe and optimum operation of the unit has a great signifi- Statistical models are widely used for prediction of the rela-
cance in oil refining. This leads to a need in proper monitoring of tionship between a dependent variable and one or more indepen-
the operation for process engineers and a proper estimation of the dent variables. There are many methods for generating a statisti-
operation dynamics for planning engineers. Both objectives prove cal model and the most common among them is the least squares
to be troublesome as the physical properties can be measured by regression method which may give misrepresentative results with
taking a sample from the stream periodically and then analyz- outliers and noise in the training data and may not generalize well
ing these sample in a laboratory with appropriate equipment or to unseen and unchecked testing data.
by online analyzers that are very expensive, complex and hard-to- The concept of Support Vector Machines (SVM) has been es-
maintain sets of equipment. The periodicity in measurement leads tablished by Vapnik and Lerner (1963) and Vapnik and Chervo-
to sub-optimal control of the unit or in some cases to off-spec nenkis (1968) in 1963 for solving classification problems. Then
products. This gives rise to the need for estimating the plant dy- nonlinear classification has been made possible by the introduc-
namics and physical properties of the product streams. Different tion of kernel functions, proposed by Boser et al. (1992) which has
methods like rigorous distillation column models that depend on been suggested by Aizerman et al. (1964). The problem of outliers
the laws of thermodynamics, and statistical models that depend on have been finally solved by soft margins which was proposed by
online plant data have been incorporated to estimate the behavior Cortes and Vapnik (1995).
of a distillation column. Interest in SVM has increased over the last 15 years due to its
First principle mechanistic models have been developed for distinctive features and performance in solving regression and clas-
simulating a steady-state, multicomponent and multistage dis- sification problems. A detailed technical report on SVM for its use
tillation column. A rigorous method by Naphtali and Sandholm in classification and regression problems has been published by
(1971) has been further improved by Boston and Sullivan (1974), Gunn (1998) in 1998. Detailed tutorials on Support Vector Regres-
Hofeling and Seader (1978), Russell (1983), Kumar et al. (2001) and sion (SVR) were published by Smola and Schölkopf (1998, 2004) in
many others. Many researchers have further studied the use of rig- 1998 and 2004. A general purpose library has been distributed by
orous methods in optimization of distillation columns (Basak et al., Chang and Lin (2011) as LIBSVM that can be integrated to many
2002; More et al., 2010; Inamdar et al., 2004; Seo et al., 20 0 0). programming languages and data mining software.
Any rigorous model has mass and energy balance equations, SVM has also been used in modeling of chemical processes
need a comprehensive physical properties library and are very like polymerization (Lee et al., 2005), desulfurization (Shokri et al.,
complex. One needs an extensive knowledge of the method to be 2015), polymer extrusion (Chitralekha and Shah, 2010) and iso-
able to fit a model to an existing distillation column and the model merization (Li et al., 2009). Yao and Chu (2012) employed SVM in
usually deviates from the real data in time. The model is as good modeling and optimization of an atmospheric distillation column.
as the physical property estimation library selected for predicting Yan et al. (2004) developed a soft sensor for estimating the freez-
properties like phase equilibrium coefficients and enthalpies. ing point of light diesel in a distillation column. Lahiri and Ghanta
The enthalpies cannot be calculated with the assumption that (2009) employed SVR in estimating critical velocity of solid liquid
the components act like an ideal gas, since the molecules at- slurry flow.
tract and repulse each other, and have finite volume. Equations of Yao and Chu (2012) applied SVR for a real atmospheric dis-
state proposed by many academicians (Peng and Robinson, 1976; tillation column and optimized the SVR parameters using parti-
Redlich and Kwong, 1949; Soave, 1972) can be used to solve for cle swarm optimizer with linearly decreased inertia weight (Lin-
the phase equilibrium coefficients and excess enthalpies. Peng– WPSO), a version of a global optimizer proposed by Shi and Eber-
Robinson equation of state (PR) (Peng and Robinson, 1976) and hart (1998). Lahiri and Ghanta (2009) optimized the SVR param-
Soave modification of Redlich–Kwong (SRK) (Soave, 1972) are pop- eters using genetic algorithm (GA). Yao and Chu (2012) modeled
ularly used for predicting these properties of hydrocarbons. the atmospheric distillation column in a commercial chemical pro-
The crude oil has over 1 million different molecules in its mix- cess simulation tool, Aspen Plus and used data generated from case
ture where some are non-hydrocarbon molecules. Light hydrocar- studies. Since the training data contained simulation results, there
bons with as much as 6 or 7 carbon atoms may be defined as was neither noise nor outlier. Therefore, the SVR parameter ε to
pure components, but the rest of the molecules are grouped by be explained below (see Eqs. (2) and (3) and Fig. 1) was fixed to 0
true boiling point ranges as pseudo-components. To solve for phase (zero) which defeats the purpose of creating scarcity in the set of
equilibrium coefficients and enthalpies, one needs the properties of support vectors.
these components like critical temperature TC , critical pressure PC , The difference between SVM and the other regression methods
acentric factor ω. is that SVM minimizes the structural risk as opposed to other re-
Properties for pseudo-components like molecular weight, crit- gression methods that employed empirical risk minimization. Em-
ical temperature, critical pressure, acentric factor, heat of forma- pirical risk minimization only focuses on minimizing the error of
A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711 3
A special loss function has been introduced to Support Vector fL = w2 + C ξi + ξi∗ − αi (ε + ξi − yi + w, xi + b)
2
i=1 i=1
Regression similar to, but different than widely known loss func-
tions like Quadratic, Laplace or Huber functions. This loss func-
l l l
tion is insensitive to errors less than an error tolerance of ε which − αi∗ ε + ξi∗ + yi − w, xi − b − ( η i ξi ) − ηi∗ ξi∗ .
enables sparseness at the set of support vectors (Cortes and Vap- i=1 i=1 i=1
nik, 1995). This new loss function is named ε -insensitive function (4)
and will be further discussed together with the other mentioned The Lagrange multipliers are ηi , ηi∗ , αi , αi∗ and they are non-
loss functions. negative:
The first attempt at Support Vector Regression was with a hard
constraint where the goal was to estimate y with a function f(x) ηi , ηi∗ , αi , αi∗ ≥ 0. (5)
that has at most ε deviation from the training data and at the The Lagrange function has a saddle point where the partial
same time has lowest model complexity and is as flat as possible. derivatives of the Lagrange function with respect to the primal
This means any error lower than ε is neglected, while any error variables w, b, ξi , ξi∗ , are equal to 0 (zero) at the optimal point:
more than ε is not tolerated. For flatness and lower model com-
l
plexity, one should pursue a small combination of support vectors, ∂L
w. To do this we can minimize the function w2 which is the dot =0=w− αi − αi∗ xi , (6)
∂w
product, w, w. The error of estimation will have a hard constraint i=1
4 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711
1
l l
min αi − αi∗ α j − α ∗j xi , x j
∂L 2
i=1
= 0 = C − αi − ηi ,
j=1
(8)
∂ ξi l
l
+ε αi + αi∗ − yi αi − αi∗ ,
∂L ⎧ l
i=1 i=1
= 0 = C − αi∗ − ηi∗ . (9)
∂ξi∗ ⎨
st.
αi − αi∗ = 0, (15)
The optimality conditions above helps us simplify the dual ⎩ i=1
0 ≤ αi , αi∗ ≤ C.
problem to be in terms of dual variables which are the Lagrange
multipliers. In this case, the Eqs. (8) and (9) can be used to elim- We need to solve for the optimum values for the Lagrange
inate the Lagrange multipliers ηi , ηi∗ by rearranging the equations multipliers,α i and αi∗ . This study uses βi = αi − αi∗ , and βi∗ = αi +
as: αi∗ as decision variables in the quadratic programming to create
sparseness in the H matrix for faster convergence. The size of de-
ηi = C − αi , cision variable vector is 2l, two times the number of training sam-
ηi∗ = C − αi∗ . (10) ples. The problem is reformulated as,
By substituting Eqs. (10) and (11) into Eq. (4) with constraints xl , x1 xl , x2 ··· xl , xl
deduced from Eqs. (7) and (10), the primal variables w, b, ξi , ξi∗
are eliminated and we get the dual problem as: Then, the H matrix of the quadratic programming is written as,
⎡ ⎤
M 0 ··· 0
1
l l
max − αi − αi∗ α j − α ∗j xi , x j ⎢ .. ⎥
2 ⎢0 0 .⎥
H2lx2l = ⎢
.. ⎥
i=1 j=1 , (18)
l
⎣ .. .. ⎦
l . . .
−ε αi + αi + ∗
yi αi − αi ,∗
0 ··· ··· 0
i=1 i=1
⎧ l and the f matrix is
⎨ ⎡ ⎤
st.
αi − αi∗ = 0, (13) −y1
⎩ i=1 ⎢ .. ⎥
0 ≤ αi , αi∗ ≤ C. ⎢ . ⎥
⎢ ⎥
⎢ −y ⎥
Now that the optimization problem does not include w, support f2lx1 = ⎢ l ⎥. (19)
⎢ ε ⎥
vector regression problems especially with nonlinear feature space ⎢ .. ⎥
transformations can be solved independently of the dimension of ⎣ . ⎦
the feature space with quadratic programming. ε
Then the matrixes A, Aeq and vectors b, beq are formed accord-
2.2. Training with quadratic programming ing to the constraints in Eq. (16).
Quadratic programming minimizes a quadratic objective func- 2.3. The support vectors
tion with decision variables subject to linear equality and inequal-
ity constraints. The Karush–Kuhn–Tucker (KKT) conditions or complementary
slackness conditions state that at the optimum point, the product
1 T
min x Hx + f T x, r between dual variables and primal constraints should be equal to
2 zero:
Ax ≤ b,
s.t. Aeq x = beq , (14) αi (ε + ξi − yi + w, xi + b) = 0,
lb ≤ x ≤ ub. αi∗ ε + ξi∗ + yi − w, xi − b = 0. (20)
A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711 5
Also, the product of primal variables and dual constraints The Eq. (11) for term w is expanded to:
should be equal to zero:
l
ξi (C − αi ) = 0, w= αi − αi∗ φ (xi ). (28)
ξi∗ C − αi∗ = 0. (21) i=1
From the Eq. (20), we can see thatαi αi∗ = 0, meaning at least The matrix M in quadratic programming is rewritten as:
one of α i or αi∗ should be equal to zero. From Eq. (21), train- ⎡ ⎤
K ( x1 , x1 ) K ( x1 , x2 ) ··· K ( x1 , xl )
ing data with αi = C or αi∗ = C can have positive primal slack
variablesξ i or ξi∗ , meaning the training data can be outside of the
⎢K (x2 , x1 ) K ( x2 , x2 ) ··· K (x2 , xl )⎥
Mlxl =⎢ ⎥. (29)
ε-insensitive zone if and only if α i or αi∗ is equal to C. ⎣ ... ... ..
. ... ⎦
Remember that in Eq. (11), the term w is expanded into a K ( xl , x1 ) K ( xl , x2 ) ··· K ( xl , xl )
combination of the input vectors. From Eq (20), when the esti-
mation error, | f (xi ) − yi | is lower than ε , both Lagrange multi- Finally, the estimation function in Eq. (22) is rewritten as:
pliers α i and αi∗ should be equal to zero, since the slack vari-
S
ables have to be nonnegative. Therefore, (αi − αi∗ ) vanishes in Eq. f (x ) = (βs K (xs , x )) + b, (30)
(11) which creates scarcity in the expansion of w. As a conclu- s=1
sion, the training data with estimation error, | f (xi ) − yi | ≥ ε have
one of the Lagrange multipliers with a nonzero value which makes where the Eq. (30) for bias term is revised to:
them Support Vectors. From now on the support vectors are xs for
1
SS S
s = 1, 2, . . . , S ≤ l, and have the coefficient βs = (αs − αs∗ ). The esti- b= yss − εσ − βss K (xss , xs ) . (31)
mation function is rewritten as: SS
ss=1 s=1
S
f (x ) = (βs xs , x ) + b. (22) It should be noted that the mapping into high dimensional fea-
s=1 ture space leads to curse of dimensionality, that is, as dimensional-
Although we expect one or both of the Lagrange multipliers ity increases, the volume of space increases that the training data
equal to zero at optimal point, quadratic programming solves them becomes sparse in this high dimensional space. To overcome this
to be very small numbers close to zero, so support vectors are se- problem, a large training set with an even data distribution should
lected by the relation: be provided to SVR (Gunn, 1998).
There are various types of kernels that can be used in SVR and
(αi − αi∗ ) > 10−6C. (23) combinations of these kernels are possible. Nonlinear kernels may
have parameters that are predefined by the user like the main pa-
2.4. The bias term
rameters of the SVR, ε and C.
We will find the bias from the average of support vectors with
estimation error equal to ε . These support vectors are xss and are 2.6. Linear kernel
identified from xs with 0 < |β s | < C. To filter out very small num-
bers the relation is updated to: This is actually the dot product of the vectors and no mapping
is applied. Since the SVR will be coded with a kernel function, lin-
10 C < |βs | < (1 − 10
−6 −6
)C. (24)
ear kernel will be among choices.
Finally, the bias term is calculated by:
K ( xi , x j ) = xi , x j . (32)
1
SS
S
b= yss − εσ − βss xss , xs , (25)
SS 2.7. Polynomial kernel
ss=1 s=1
1
l l
l
l
xi − x j 2
min βi β j K (xi , x j ) +ε β −
∗
i yi βi , K (xi , x j ) = exp − . (34)
2
i=1 j=1 i=1 i=1 2σ 2
⎧ l
⎪
⎪ βi = 0,
⎪
This function can be simplified to:
⎨ 2
K (xi , x j ) = exp −γ xi − x j
i=1
st. 0 ≤ (β + β ∗ ), (β ∗ − β ) ≤ 2C, (27) , (35)
⎪
⎪ i i i i
⎪
⎩−C ≤ βi ≤ C,
0 ≤ βi∗ ≤ 2C. where γ = 1/2σ 2 , γ ∈ R, γ > 0 is the kernel parameter.
6 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711
2.11. Product of kernels Huber loss function in Fig. 2(c) is a combination of the
Quadratic and Laplace loss functions. The function is quadratic for
Similarly, kernels can be formed by taking products of two or small error values and linear for large error values. The functions
more kernels as: and their slopes are equal at the intersections of these two func-
tions. This property combines the sensitivity of the Quadratic loss
K ( xi , x j ) = Kk (xi , x j ). (38)
function and the robustness of the Laplace loss function. The Hu-
k
ber Loss Function has the form:
1
2.12. Loss functions ( f (x ) − y )2 for| f (x ) − y| < μ,
L ( f (x ) − y ) = 2
(39)
μ| f (x ) − y| − μ2 otherwise.
2
Table 2 Table 3
Critical hydrocarbon properties. Variables used for regression.
Light naphtha Distillation curve 5% point Output Heavy diesel 95% point
Light naphtha Distillation curve 95% point Input Flash zone temperature
Heavy naphtha Flash point Input Flash zone pressure
Kerosene Flash point Input Heavy diesel tray temperature
Kerosene Distillation curve 95% point
Light diesel Distillation curve 95% point
Heavy diesel Distillation curve 95% point
fication value or valuable fraction of heavy diesel is lost to atmo-
spheric bottom. This property can be estimated by the flash zone
temperature and pressure and the heavy diesel tray temperature.
There are critical hydrocarbon properties for the products of the
Although these variables are not the only inputs which affect the
atmospheric distillation column as given in Table 2 and these prop-
product quality, they are the most obvious choices. Increasing or
erties are optimized daily by the Planning Department according to
decreasing number of inputs will significantly affect the perfor-
changing prices and demand. The plant operators receive the opti-
mance. We will look in to feature selection and generation issue
mal values and monitor these properties together with online plant
separately in the future research. The variables used in SVR and
measurements like temperature, pressure and flowrates. There are
current quality estimator are given in Table 3.
no online analyzers for the measurement of the critical hydrocar-
Online plant data as well as laboratory analysis results can be
bon properties, but instead a sample from each of the streams is
reached from process historical database which has years of data
collected three times a day and is analyzed at the laboratory. There
stored. The laboratory analysis data was gathered as raw data since
are also robust quality estimators which are nonlinear regression
it is stored three times a day, while online plant data was gath-
functions employed for online prediction of the hydrocarbon prop-
ered as the average of thirty minutes of raw data for smoothening
erties.
noise. A sample is gathered and analyzed only when the ADC is in
The distillation curve in Fig. 5 gives a plot of temperature ver-
stable operation condition and the corresponding online measure-
sus fraction of the liquid boiled at that temperature. The most crit-
ments with thirty minutes period ending with the timestamp of
ical properties are the temperature at which 5% and 95% of the
the sample were gathered.
sample boils. The distillation column cannot perfectly separate the
products, so the lighter end of the heavy product can be seen in
the lighter product and via versa. The transitioning boiling point 4.2. The procedure
temperature between two products is called the cut point. The cut
point can be controlled by manipulating the flowrate of a side- The algorithm for normalizing and splitting data, SVR train-
drawn product. If the flowrate of the lighter product is increased, ing and SVR parameter optimization was coded in MATLAB®.
more of the heavier product moves up the tray of the light product, Quadratic Programming tool of MATLAB® was employed in SVR
leading to an increase in the tray temperature and the 95%-point training and Cross Validation, Genetic Algorithm, Pattern Search
temperature of the light product. As a result, the production rate and Constrained Nonlinear Programming tools of MATLAB® were
of a product can be maximized within specification limits for the employed in SVR parameter optimization.
5%- and 95%-point temperatures. We have 413 sample sets dating from the beginning of the year
This study focuses on estimation of the 95%-point temperature 2019 until the start of June 2019. The input data is normalized be-
of heavy diesel since this property should be at maximum speci- tween −1 and 1 by anisotropic scaling. Then the dataset is split
Table 4 Table 6
Defined bounds for SVR parameters. Defined parameters for nonlinear programming.
with each subset treated once as a training set inside the optimiza-
into training and testing sets with a 15% testing ratio. The sizes of
tion solver.
training and testing sets are 352 and 61 respectively. The splitting
When optimization ends, the total training set is trained for the
ratio is decided based on the size of the dataset. The general rule
last time with the optimized SVR parameters and then the testing
of thumb is that 65 to 85 % for training to better model the un-
data is used for monitoring the performance of SVR.
derlying distribution, and then test the results with the remaining.
Although this work focused on the hypertuning of SVR, amount of 5. Results and discussion
training data is not selected as decision variable. Data splitting is
accomplished by first partitioning the dataset into segments of 15% 5.1. Optimization results
ratio and then by splitting each segment into training and testing
sets of 85% and 15% ratios respectively. The sizes of the segments The use of βi = αi − αi∗ , and βi∗ = αi + αi∗ as decision variables
and their training and testing sub-segments may differ by only one in QP greatly improved the rate of convergence. The CPU time
sample set. of SVR training with α i and αi∗ as decision variables is approxi-
The kernel is selected for the SVR. This study employed Linear, mately 1.9 seconds while the CPU time of SVR training with the
Polynomial and Gaussian Radial Basis Function kernels. The initial simplified variables was reduced by 80% to a range between 350
values, upper and lower bounds for the main SVR parameters and and 400 ms even though the problem size is the same. This im-
the kernel parameters are supplied to the solvers for SVR param- provement is achieved by halving the number of decision variables
eter optimization purposes and are given in Table 4. It should be in the second-degree part of the cost function, which creates a
noted that initial values are only supplied to Grid Search and Non- sparseness in the H matrix.
linear Programming. SVR has two important parameters ε and C The estimation performance of linear regression is given in
that affect the balance between flatness versus error of estima- Table 8 together with a comparison with the best of SVR. The esti-
tion. Parameter ε defines a margin of tolerance where no penalty mation of linear regression is compared with the laboratory analy-
is given to errors, and parameter C describes how much we penal- sis data in Fig. 6.
ize slack variables. Genetic algorithm optimization results are given in Table 9 to-
Then the solver is selected for the optimization of SVR parame- gether with a comparison against quality estimator and the best.
ters. This study has employed Genetic Algorithm, Grid Search and The table also includes the optimized SVR parameters, the number
Nonlinear Programming. Population size is selected based trial and of support vectors and the CPU time. SVR estimations with Gaus-
error. The use of smaller population resulted in lower accuracy but sian RBF kernels against laboratory analysis data are given in Fig.
increase in population size only increased computational resource, 7.
not the accuracy. The solver parameters are given in Tables 4a, 5 The results for optimization of SVR parameters by Grid Search
and 6 respectively. method are given in Table 10 and plots for the estimation with
The cross validation embedded inside the optimization solvers Gaussian RBF kernels versus laboratory analysis data are given in
uses k-fold partitioning and the partition count is set to 5 as given Fig. 8.
in Table 7.
Only training set is supplied to parameter optimization and 6. Discussion of optimization results
cross validation further partitions the training set into 5 subsets
The results show that different optimization solvers have found
different local optimal points with similar mean absolute errors
Table 5
Defined parameters for adaptive grid search. and standard deviations for the same kernels, meaning that there
are many similar local optima. Genetic method has triumphed in
Parameter Value
most of the criteria. SVR with Gaussian RBF kernel optimized by
Iteration 5 Grid Search has best performance in testing. Grid Search method
C bound 1–1000 has highest computational.
Epsilon bound 0.001–1
Sigma bound 0.1–100
It should be noted that any solution linear regression also
showed quite bit good performance. Despite Gaussian kernel with
10 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711
Fig. 6. Plot linear regression estimation vs. laboratory analysis test data.
Fig. 7. Plot of SVR with genetic algorithm and Gaussian RBF kernel.
Table 9
Results for SVR optimized with genetic algorithm.
Table 10
Results for SVR optimized with grid search.
Fig. 8. Plot of SVR with grid search and Gaussian RBF kernel.
GA tuned, difference between others is said to be negligible. The testing results show that SVR with any of the integrated
The overall results show that Grid Search resolution should be kernels performed well in estimating the property and generalized
increased to increase its accuracy. Otherwise, genetic algorithm well to unseen testing data. The mean absolute testing error of
should be used to tune parameters in this case. linear regression method is improved by 5% with SVR from 4 °C
to 3.8 °C. SVR with Gaussian kernel optimized by Genetic Algo-
7. Conclusions rithm has shown top performance in nearly all criteria. In order
to understand whether this decrease in error is significant or not,
In this paper, Support Vector Regression (SVR) was employed we need to know measurement uncertainty of laboratory. Unfortu-
for estimating 95% Boiling Point Temperature property of Heavy nately, measurement uncertainty of TÜPRAŞ lab cannot be shared
Diesel product of a real operating Atmospheric Distillation Column with readers because of the confidentiality. Nevertheless, it must
(ADC) in İzmit Refinery of Turkish Petroleum Refineries Corpora- be noted that this decreased in reduction can be further decreased
tion. Linear, Polynomial and Gaussian Radial Basis Function (Gaus- by adding data preprocessing and feature selection methods.
sian RBF) kernels were tested and the SVR parameters were opti- Yet a recent subject in Machine Learning, modified versions of
mized by embedding k-fold cross validation in an optimizer such SVRs can be implemented for improving the SVR parameter opti-
as Genetic Algorithm (GA) and Grid Search (GS). The performance mization process. One of the promising versions is the v-Support
of SVR was compared against linear regression method already Vector Regression where the width of the insensitive region is op-
functioning in the ADC. timized within the training process by fixing the fraction of sup-
The SVR parameters were optimized in a time period ranging port vectors in a training set. The fraction will be a real number
from 5 min to 10 min, but once the parameters are optimized, the between 0 and 1 and can be optimized in the SVR parameter opti-
SVR can be trained with the optimum parameters in approximately mization step.
350 ms. As a result, the SVR model can be updated with incoming Also, there may be other known column dynamics correlating
laboratory analysis data within a second while the SVR parameters with the output parameter. The online measurements incorporated
can be updated periodically in monthly intervals. in the linear regression method and in this study were selected by
The rate of convergence of SVR training is reduced by 80% ap- experts of the process, but a principle component analysis (PCA)
proximately from 1.9 s to 350 ms by incorporating simplified deci- can be conducted to see the effects of other column measure-
sion variables instead of using the Lagrange multipliers as decision ments. We will look into feature selection and generation issue
variables. This improvement is achieved by creating sparseness in separately in the future research.
the second-degree part of the cost function.
12 A.C. Serfidan, F. Uzman and M. Türkay / Computers and Chemical Engineering 134 (2020) 106711
Finally, in this work only heavy diesel 95%-point is focused and Hofeling, B.S., Seader, J.D., 1978. A modified Naphtali-Sandholm method for general
estimated. However, there are many quality parameters to be pre- systems of interlinked, multistaged separators. AIChE J. 24 (6), 1131–1134.
Inamdar, S.V., Gupta, S.K., Saraf, D.N., 2004. Multi-objective optimization of an in-
dicted in refinery process such as kerosene, naphtha 95%-points. dustrial crude distillation unit using the elitist non-dominated sorting genetic
We believe that very similar improvements can be achieved by us- algorithm. Chem. Eng. Res. Des. 82 (5), 611–623.
ing same approach. Khoury, F.M., 2014. Multistage Separation Processes, Fourth ed. Taylor & Francis.
Kumar, V., et al., 2001. A crude distillation unit model suitable for online applica-
tions. Fuel Process. Technol. 73 (1), 1–21.
Declaration of Competing Interest Lahiri, S.K., Ghanta, K.C., 2009. Hybrid support vector regression and genetic algo-
rithm technique – a novel approach in process modeling. Chem. Prod. Process
Model. 4 (1), 14–23.
None.
Lee, D.E., et al., 2005. Weighted support vector machine for quality estimation in
the polymerization process. Ind. Eng. Chem. Res. 44 (7), 2101–2105.
CRediT authorship contribution statement Leffler, W.L., Petroleum refining for the nontechnical person. 2nd ed, Pennwell Corp.,
1985.
Li, L., Su, H., Chu, J., 2009. Modeling of isomerization of C8 aromatics by online least
Ahmet Can Serfidan: Investigation, Data curation, Software, squares support vector machine. Chin. J. Chem. Eng. 17 (3), 437–444.
Validation, Visualization, Writing - review & editing. Firat Uzman: More, R.K., et al., 2010. Optimization of crude distillation system using aspen plus:
Conceptualization, Data curation, Formal analysis, Validation, Writ- effect of binary feed selection on grass-root design. Chem. Eng. Res. Des. 88 (2),
121–134.
ing - original draft. Metin Türkay: Conceptualization, Formal anal- Naphtali, L.M., Sandholm, D.P., 1971. Multicomponent separation calculations by lin-
ysis, Investigation, Methodology, Supervision, Writing - review & earization. AIChE J. 17 (1), 148–153.
editing. Peng, D.-Y., Robinson, D.B., 1976. A new two-constant equation of state. Ind. Eng.
Chem. Fundam. 15 (1), 59–64.
Redlich, O., Kwong, J., 1949. On the thermodynamics of solutions. V. An equation of
References state. Fugacities of gaseous solutions. Chem. Rev. 44 (1), 233–244.
Russell, R.A., 1983. A flexible and reliable method solves single-tower and crude
Aizerman, A., Braverman, E.M., Rozoner, L., 1964. Theoretical foundations of the po- distillation-column problems. Chem. Eng. 90 (21), 52–59.
tential function method in pattern recognition learning. Autom. Remote Control Seo, J.W., Oh, M., Lee, T.H., 20 0 0. Design optimization of a crude oil distillation pro-
25, 821–837. cess. Chem. Eng. Technol. 23 (2), 157–164.
API, 2008. Basic Petroleum Data Book, 28. American Petroleum Institute. Shi, Y., Eberhart, R., 1998. A modified particle swarm optimizer. In: Proceedings of
Basak, K., et al., 2002. On-line optimization of a crude distillation unit with con- the 1998 IEEE International Conference on Evolutionary Computation proceed-
straints on product properties. Ind. Eng. Chem. Res. 41 (6), 1557–1568. ings: IEEE World Congress on Computational Intelligence. IEEE.
Bennett, K.P., Mangasarian, O.L., 1992. Robust linear programming discrimination of Shokri, S., et al., 2015. Soft sensor design for hydrodesulfurization process using
two linearly inseparable sets. Optimiz. Methods Softw. 1 (1), 23–34. support vector regression based on WT and PCA. J. Cent. South Univ. 22 (2),
Boser, B.E., Guyon, I.M., Vapnik, V.N., 1992. A training algorithm for optimal mar- 511–521.
gin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Smola, A., Schölkopf, B., 2004. A tutorial on support vector regression. Stat. Comput.
Learning Theory, Pittsburgh, Pennsylvania, USA. ACM, pp. 144–152. 14 (3), 199–222.
Boston, J.F., Sullivan, S.L., 1974. A new class of solution methods for multicompo- Smola, A.J. and B. Schölkopf, A tutorial on support vector regression, in NeuroCOLT2
nent, multistage separation processes. Can. J. Chem. Eng. 52 (1), 52–63. Technical Report Series. 1998.
Chang, C.-C., Lin, C.-J., 2011. LIBSVM: a library for support vector machines. ACM Soave, G., 1972. Equilibrium constants from a modified Redlich–Kwong equation of
Trans. Intell. Syst. Technol. 2 (3), 27. state. Chem. Eng. Sci. 27 (6), 1197–1203.
Chitralekha, S.B., Shah, S.L., 2010. Application of support vector regression for devel- Vapnik, V.N., Chervonenkis, A.Y., 1968. On the uniform convergence of relative fre-
oping soft sensors for nonlinear processes. Can. J. Chem. Eng. 88 (5), 696–709. quencies of events to their probabilities. Dokl Akad Nauk SSSR 181 (4), 781–783.
Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20 (3), 273–297. Vapnik, V., Lerner, A., 1963. Generalized portrait method for pattern recognition.
Fletcher, R., Sainz de la Maza, E., 1989. Nonlinear programming and nonsmooth Autom. Remote Control 24 (6), 774–780.
optimization by successive linear programming. Math. Program. 43 (1–3), Yan, W., Shao, H., Wang, X., 2004. Soft sensing modeling based on support vector
235–256. machine and Bayesian model selection. Comput. Chem. Eng. 28 (8), 1489–1498.
Gary, J.H., Handwerk, G.E., Kaiser, M.J., 2010. Petroleum Refining: Technology and Yao, H., Chu, J., 2012. Operational optimization of a simulated atmospheric distilla-
Economics. CRC Press. tion column using support vector regression models and information analysis.
Gunn, S.R., Support vector machines for classification and regression. ISIS Technical Re- Chem. Eng. Res. Des. 90 (12), 2247–2261.
port, 1998.14.