Efficient River Water Quality Index Prediction Con

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Engineering Applications of Computational Fluid

Mechanics

ISSN: 1994-2060 (Print) 1997-003X (Online) Journal homepage: https://www.tandfonline.com/loi/tcfm20

Efficient river water quality index prediction


considering minimal number of inputs variables

Faridah Othman, M.E. Alaaeldin, Mohammed Seyam, Ali Najah Ahmed,


Fang Yenn Teo, Chow Ming Fai, Haitham Abdulmohsin Afan, Mohsen Sherif,
Ahmed Sefelnasr & Ahmed El-Shafie

To cite this article: Faridah Othman, M.E. Alaaeldin, Mohammed Seyam, Ali Najah Ahmed,
Fang Yenn Teo, Chow Ming Fai, Haitham Abdulmohsin Afan, Mohsen Sherif, Ahmed Sefelnasr &
Ahmed El-Shafie (2020) Efficient river water quality index prediction considering minimal number of
inputs variables, Engineering Applications of Computational Fluid Mechanics, 14:1, 751-763, DOI:
10.1080/19942060.2020.1760942

To link to this article: https://doi.org/10.1080/19942060.2020.1760942

© 2020 The Author(s). Published by Informa


UK Limited, trading as Taylor & Francis
Group

Published online: 05 Jun 2020.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tcfm20
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS
2020, VOL. 14, NO. 1, 751–763
https://doi.org/10.1080/19942060.2020.1760942

Efficient river water quality index prediction considering minimal number of


inputs variables
Faridah Othman a , M.E. Alaaeldin b , Mohammed Seyam c , Ali Najah Ahmed d , Fang Yenn Teo e , Chow
Ming Fai f , Haitham Abdulmohsin Afan g , Mohsen Sherif h,i , Ahmed Sefelnasr h and Ahmed El-Shafie a∗

a Civil Engineering Department, Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia; b Surveying Engineering Department,
Faculty of Engineering Sciences, Omdurman Islamic University, Khartoum, Sudan; c Department of Civil Engineering and Geomatics, Durban
University of Technology, Durban, South Africa; d Institute of Energy Infrastructure (IEI), Universiti Tenaga Nasional, Selangor, Malaysia; e Faculty
of Science and Engineering, University of Nottingham Malaysia, Selangor, Malaysia; f Institute of Sustainable Energy (ISE), Universiti Tenaga
Nasional (UNITEN), Selangor, Malaysia; g Institute of Research and Development, Duy Tan University, Da Nang, Vietnam; h National Water Center,
United Arab Emirates University, Al Ain, United Arab Emirates; i Civil and Environmental Eng. Dept., College of Engineering, United Arab
Emirates University, Al Ain, United Arab Emirates

ABSTRACT ARTICLE HISTORY


Water Quality Index (WQI) is the most common determinant of the quality of the stream-flow. Received 2 October 2019
According to the Department of Environment (DOE, Malaysia), WQI is chiefly affected by six fac- Accepted 18 April 2020
tors, which are, chemical oxygen demand (COD), biochemical oxygen demand (BOD), dissolved KEYWORDS
oxygen (DO), suspended solids (SS), -potential for hydrogen (pH), and ammoniacal nitrogen (AN). In Surface water hydrology;
fact, understanding the inter-relationships between these variables and WQI can improve predicting Artificial Neural Networks;
the WQI for better water resources management. The aim of this study is to create an input approach modelling; water quality
using ANNs (Artificial Neural Networks) to compute the WQI from input parameters instead of using index
the indices of the parameters when one of the parameters is absent. The data are collected from the
nine water quality monitoring stations at the Klang River basin, Malaysia. In addition, comprehen-
sive sensitivity analysis has been carried out to identify the most influential input parameters. The
model is based on the frequency distribution of the significant factors showed exceptional ability to
replicate the WQI and attained very high correlation (98.78%). Furthermore, the sensitivity analysis
showed that the most influential parameter that affects WQI is DO, while pH is the least one. Addi-
tionally, the performance of models shows that the missing DO values caused deterioration in the
accuracy.

1. Introduction
modeling and analysis are regarded as the most crucial
Recently, the water plays an essential role as a main steps for water quality assessment (Sharma et al., 2013).
resource for human activities (industrial activities, agri- Due to the importance of the surface water that people
cultural activities, etc.) as well as human existence (Wan depend on it for their aquatic life support, water supply,
Mohtar et al., 2019). Thus, it’s a fundamental unit of recreation, fisheries and transportation, so the manage-
our life and, therefore, the water resource assessment, ment of the surface water resources is an important. One
monitor and preservation is highly recommended espe- of these resources is the rivers, which are very impor-
cially for developing countries. Furthermore, the assess- tant for agricultural, irrigation, industrial, residential use
ment of the water resources is an essential for river and other human activities (Kido et al., 2009; Luo et al.,
basin and water resources planning (Caddis et al., 2012). 2011). The quality of the rivers water is affected by a vari-
Letcher et al. (2007) has defined the water resources ous factor. However, the biological and physico–chemical
assessment as ‘the process of assessing the source, scope, parameters were used for many previous studies to assess
reliability, quantity and quality of water resources for the river water quality (Atasoy et al., 2006; Avvannavar &
the purposes of water resources utilization and man- Shrihari, 2008).
agement’. The water quality assessment is regarded as Certain approaches are used to assess river water
one of the most crucial steps for controlling the water quality out of which one is determining WQI. It is a
bodies. Moreover, the observation of data categorization, statistical tool which is used to convert large amount

CONTACT Faridah Othman [email protected]; Haitham Abdulmohsin Afan [email protected]


∗ Present address: National Water Center, United Arab Emirates University, Al Ain, United Arab Emirates

© 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.
752 F. OTHMAN ET AL.

Table 1. The WQI classification. Modeling techniques are used to carry out a systematic
WQI value WQI class and methodical analysis, aiming at understanding the
93 – above I cause and effect relationships and assessing the impact of
78 – < 93 II changes in ambient water quality due to various possi-
52 – < 78 III
31 – < 52 IV
ble scenarios, such as changes in land use and loadings,
11 – < 31 V etc. Water-quality modeling is the linkage between the
sources of pollution and the in-stream water quality pro-
cesses of a given water body. In summary, a model is not
of information of water quality into just one number perceived to be more than a representation of the physi-
(Stambuk-Giljanovic, 1999). This indicator is a single cal, chemical, and biological water-quality processes and
quantity that signifies a huge volume of data. Thus, WQI mechanisms that occur in a water body.
in other words is defined as a single numeric expres- The AI techniques have been applied and approved
sion that represents complex information, which gath- recently as an appropriate tool to model composite non-
ered from the water body. Certainly, the index scaled linear phenomena in water bodies system and hydrol-
between 0 and 100, where a higher value of the index ogy. Recent investigations have used the capabilities of
interprets good water quality and a lower value indicates the artificial neural network (ANN) in modeling water
a poor quality of the water (Rubio-Arias et al., 2012; Tao resource variables (Heddam & Kisi, 2018; Malik et al.,
et al., 2019). 2019; Sepahvand et al., 2019). Another research by
Among the factors of water quality measured to assess (Alizadeh et al., 2018) have been used the machine learn-
the river water status, only 6 are used to determine WQI: ing to study the effect of river flow on the quality of
biochemical oxygen demand (BOD), dissolved oxygen estuarine and coastal water. Recently, the application of
(DO), suspended solids (SS), chemical oxygen demand AI models in the field of water resources becomes promi-
(COD), ammoniacal nitrogen (AN), and potential for nent where several AI models have been used in this filed
hydrogen (pH) (Hameed et al., 2017). such as neural networks, adaptive neuro-fuzzy inference
The WQI methods are used by many countries to system, and other hybrid models (Olyaie et al., 2015).
evaluate the general status of their rivers. Although the The ANNs (i.e. Artificial Neural Networks) technique
concepts are similar, these indices aren’t same and differ is an information-driven modeling technique with an
from country to country (Tung & Yaseen, 2020). adaptable statistical structure which is able to model
By reading the parameters of WQI, quality of water complex and nonlinear correlations between input as well
can subsequently be classified as per the below given as output data sets without requiring an insight into the
Table 1: natural phenomena (Makarynska & Makarynskyy, 2008).
Class I water quality is regarded harmless for direct ANNs have three or more layers: input layer, hidden
drinking, Class II is harmless for swimming but needs layers as well as output layers. The only role of the nodes’
treatment after which it becomes fit for drinking, Class input layer is to convey the inputs information to the neu-
III requires intense treatment before it can be used for rons of the primary hidden layer (Elzwayie et al., 2017).
drinking, Class IV can only be used for domestic animals The role of the output layer is to generate an output for the
and plants usage and Class V cannot be utilized for any specified input (Liu & Chen, 2012). The in-between hid-
of the purposes mentioned above (Ho et al., 2019). den layers (there can be just a single layer) act as a set of
The prediction of the ecological as well as environ- feature detectors. The suitable network framework deter-
mental impacts of waste and pollutant disposal as a result mination is one of the most crucial and complex tasks in
of associated land use changes and modification is now modeling the system (Ghorbani et al., 2018; Karimi et al.,
appeared to be more than a fundamental requirement 2013; Sudheer et al., 2002). Figure 1 shows schematic rep-
for river engineering personnel. Investigating ways and resentation of a general model of ANNs of this 3-layer
means of linking land use, pollutant loading and dis- system.
posal, water quality and ecosystem impacts together. The There are several ways to apply ANNs. Finding the
use of computer-based water quality models is widely best solution is a challenging task, since the model-maker
accepted for such purpose. These models could be a has to employ a systematic testing of a huge amount
merely simple ‘black box type’ mass balance models of possibilities and make certain assumptions based on
to be used as planning and screening tools to com- experience so as to keep it within convenient propor-
mercially available dynamic and complex water quality tions. Such assumptions are systematically represented
models primarily used for strategic planning purposes. and justified here: training algorithm, network topology,
Water quality modeling techniques have evolved as an choice of input, and optimum network size (Anctil et al.,
accepted tool to support the surface water management. 2004).
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS 753

Figure 1. Schematic description of a three-layer ANNs and of the elements and its (mathematical) neurons.

There are several frameworks and types of ANNs river water. Additionally, this study is focusing on creat-
like PNN (i.e. Probabilistic Neural Networks), GRNN ing an input approach of the new model to compute the
(i.e. General Regression Neural Networks), RBF (i.e. Radial WQI directly from input parameters instead of using the
Basis Functions) and MLP (i.e. Multilayer Perceptron indices of the parameters when one of the parameters is
Network) (Maier et al., 2010; Yaseen et al., 2019). MLP absent.
is the most commonly used ANN in which the neurons Calculating WQI using traditional approaches
are connected in layers as shown in Figure 1. Each neuron depends on and empirical equations which include high
is linked with neurons in consecutive layers. Every neu- degree of approximation which gives uncertainty in the
ron in the output or the hidden layer obtains weighted results in addition the WQI equation cannot be used
inputs from each of the neurons in the preceding layer. when one of the parameters is missing. Thus, ANN tech-
Then, the effectual incoming signal propagates forward nique is an efficient and straightforward technique for
via a non-linear activation function, to the neurons in the computing and modeling WQI and managing the issue of
subsequent layer. This means that each individual neu- the absent factors. This study can be regarded as among
ron performs twice. At first, there is an integration of the few contributions in developing a model for under-
the data from other neurons or from a source external standing the correlation between the parameters and the
to the system. It is done often using a linear function. WQI in the Klang River using ANNs.
Then, it generates an output in line with a prefixed activa-
tion function like the linear, cubic or sigmoid polynomial
2. Study area
function. This input to output transformation within one
neuron is fairly simple; the ANN power and complex- Malaysia has several natural resources. Around 95% of
ity is finally achieved by an interaction among multiple Malaysian water resources belong to the inland river sys-
neurons (Adamowski & Sun, 2010). tems. As the country’s dream comes true for its vision
The most essential aim of this study is to develop 2020, the demands of the water increase highly and there
ANNs framework by examining the correlation between is more pressure to preserve the current water resources
WQI and the six influencing parameters, which are COD, in order to find other course of actions to improve the
BOD, DO, pH-, AN, and SS. An insight into the corre- quality of the water. Swift industrialization, although rea-
lation between the WQI and these six factors can play sonably well-planned and controlled, has developed an
a significant role in the development of an integrated increased pressure on city areas particularly in the Klang
model for management of the quality of the river water River basin, the most thickly populated region of the
and in assessing the overall status of the quality of the nation (Bradley, 2010).
754 F. OTHMAN ET AL.

Figure 2. (a) Location of Klang River Catchment and (b) water quality stations.
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS 755

The Basin as shown in Figure 2 includes Kuala Table 2. The calculations of the sub-index (Othman et al., 2012).
Lumpur’s Federal Territory, parts of Hulu Langat, Gom- Sub-index
bak, Petaling, and Klang districts in the State of Selangor, parameter Value Condition
as well as the municipal regions of Shah Alam, Petal- SIDO 0 DO < 8
100 DO > 92
ing Jaya, and Ampang Jaya. Klang River stems from – 0.395 + 0.030DO2 – 0.00020DO3 8 < DO < 92
the North-East region of Kuala Lumpur, which is abun- SIBOD 100.4–4.23BOD BOD < 5
dant in mountains, spread over 25 kms. Before linking 108e−0.055BOD – 0.1BOD BOD > 5
SICOD −1.33COD + 99.1 COD < 20
to the Malacca Strait at Port Klang and while crossing 103e−0.0157COD – 0.04COD COD > 20
the Federal Territory as well as the region that is down- SIAN 100.5 – 105AN AN < 0.3
94e−0.573AN – 5 | AN – 2 | 0.3 < AN < 4
stream of Kuala Lumpur, 11 major tributaries connect 0 AN > 4
with the Klang River. The basin of the Klang River is SISS 97.5e−0.00676SS + 0.05SS SS < 100
71e−0.0016SS – 0.015SS 100 < SS < 1000
located within two states in Malaysia, Kuala Lumpur 0 SS > 1000
and Selangor. Encompassing 120 km, Klang River drains SIpH 17.2–17.2 pH + 5.02pH2 pH < 5.5
1,288 km2 area from the vertical mountain rain forests −242 + 95.5 pH – 6.67pH2 5.5 < pH < 7
−181 +82.4 pH – 6.05pH2 7 < pH < 8.75
of the chief Central Range which falls along Peninsular 536–77.0 pH + 2.76pH2 pH > 8.75
Malaysia towards the mouth of the river in Port Klang
(Directors, 2002). The catchment area within Peninsular
Malaysia and the locations of the monitoring stations are
shown in Figure 2.
3.3. Collecting data and construction of modeling
data matrix
3. Methodology
With the aim to develop a model for the WQI employ-
3.1. Water quality monitoring data
ing ANNs, it is essential to obtain data for modeling and
The examination of the quality of the water is a crucial training purposes. The data for training must include suf-
component in the water bodies’ management, preserva- ficient number of cases, each including values for input
tion and treatment (Mei et al., 2011). Data regarding as well as output parameters. Knowledge and proficiency
water quality were gathered on the basis of the in-situ in this problem domain and situations provide a pri-
measurements and laboratory assessment. The most gen- mary idea which influences the input parameters (Jiang
eral parameters which are measured are those associated & Cotton, 2004).
with water pollution caused due to effluent discharge, As shown in Figure 2(b), the selected station has been
sewage and land clearing. Parameters which are mea- chosen to be distributed in all tributaries and mainstream
sured at the location include DO and pH. Laboratory to cover all the basin area and depend on the land use.
assessments were performed to calculate the factors of Also, the selection is done based on the data availability.
the water quality, i.e. COD, BOD, TSS and AN. Addi- Previously, it has been mentioned that WQI is affected
tional data have also been obtained from the Department by 6 factors: DO, COD, BOD, pH, AN and SS. The
of Environment (DOE, Malaysia). Figure 2(b) shows the required data were collected from the nine water quality
locations of the monitoring stations. monitoring stations in Klang River basin starting from
1997 to 2007. The sampling procedure has been made
3.2. Calculation of the water quality indices by the Department of Irrigate and Drainage (DID) in
Malaysia where the sampling of most of station in this
The DOE calculates the WQI based on the six parameters
study has been made at least two time per week over the
as mentioned before by using Equation bellow (Hameed
years, thus the seasonality of sampling has been taken in
et al., 2017):
the consideration.
WQI = 0.22SIDO + 0.19SIBOD + 0.16SICOD After gathering data, it is examined, validated and
+ 0.16SISS + 0.15SIAN + 0.12SIpH (1) reorganized to create matrix of the training data which
should contain sufficient number of cases, each having
Where, WQI = Water quality index; SIDO = Sub-index input and output values for the modeling. The rearrang-
of DO; SIBOD = Sub-index of BOD; SICOD = Sub- ing of the initially gathered data is necessary for build-
index of COD; SIAN = Sub-index of AN; SISS = Sub- ing a matrix for the model data. Data rearranging was
index of TSS; SIpH = Sub-index of pH. performed using MS Excel software. The data matrix is
The sub-indices, which are included in Equation (1) regarded as the raw material which is required for the
above, calculated according to the best-fit relations given ANN model. Table 3 illustrates a part of ANNs model
in Table 2. matrix.
756 F. OTHMAN ET AL.

Table 3. Part of ANNs model matrix.


St. No. date DO BOD COD SS PH AN WQI
St.02 Jan-97 0.0 9.4 32.0 890.0 6.9 9.3 34.3
St.03 Jan-97 0.0 10.6 37.0 250.0 6.6 9.0 39.0
St.04 Jan-97 0.0 6.5 14.0 157.0 7.0 18.5 47.5
St.05 Jan-97 2.1 6.0 38.0 94.0 7.0 9.1 44.4
St.06 Jan-97 2.1 11.3 32.0 120.0 7.1 4.2 41.6
St.07 Jan-97 2.9 8.5 43.0 200.0 7.4 1.8 45.2
St.08 Jan-97 2.0 9.8 45.0 160.0 9.0 1.5 41.5
St.09 Jan-97 2.4 6.5 17.0 80.0 7.2 3.5 48.7
St.10 Jan-97 5.5 2.8 2.0 47.0 6.4 0.2 67.2
St.07 Feb-97 4.1 17.2 31.0 195.0 7.2 3.1 38.8

Table 4. Mean, standard deviation and range of variables used in modeling.


Range
Variable Sym. Type Unit Mean Std. Dev. Min. Max.
dissolved oxygen DO Indep. mg/l 2.64 1.98 0.00 8.29
biochemical oxygen demand BOD .Indep mg/l 10.69 7.73 1.00 55.00
chemical oxygen demand COD .Indep mg/l 45.82 22.01 2.00 179.00
suspended solids SS .Indep mg/l 130.39 143.24 1.00 1000.00
potential for hydrogen pH .Indep Unit 7.06 0.34 5.34 7.96
ammoniacal nitrogen NH3Nl AN .Indep mg/l 4.72 3.02 0.00 19.00
Water Quality Index WQI Dep. Unit 50.82 14.26 17.39 93.13

3.4. Analysis of ANNs model data entry procedure. The input data represent the cases that
are used by the network to train itself. Then, the input
Before start modeling, data were analyzed to check the
(i.e. Dependent) as well as the output (i.e. Independent)
normality of data. Considering just those cases having
for the Artificial Neural Networks model was given as
complete numeric data for all parameters and not missing
shown in Table 3. Around half of the cases were chosen
any data, around 941 cases fulfill the above-mentioned
for training; 25% each for calibration and testing. It is
conditions in the duration 1997–2005. The data for the
an arbitrarily selected amount and the expert can mod-
years 2006 and 2007 will be employed to assess the ANNs
ify this percentage. The data set used for the test provides
model. The ANNs model might do well over the whole
a way by which the network can recognize when it is
space only when the data of training are uniformly or
time to stop the training and start calibration and testing
semi-uniformly distributed in the entire range of val-
(May et al., 2008). Further, the appropriate model of the
ues for each of the parameters. As the current data are
network was selected from the networks available on the
gathered using limited sources (9 stations), they possi-
basis of the type of the problem and the data. Following
bly comprise clusters. Thus, the distribution assessment
several trials, MLP has been selected due to its high abil-
of every parameter across its range is tested. The standard
ity to standardize the problems involving high amount
deviation mean and ranges of all the different parameters
of nonlinearity and complexity. As soon as the network
are displayed in Table 4. From this table, it can be seen a
type is finalized, the conditions to end the training proce-
high fluctuation in the range of these statistics. This vari-
dures are set prior to the training of the network. Training
ation in the statistics of parameters could influence the
was regulated by certain conditions such as: target perfor-
performance of ANN model. To simplify the process of
mance specifying the tolerance between the NN predic-
modeling, the normalization process carried out where
tion and the actual output, the maximum iterations, the
all data parameters normalized between 0 and 1.
minimum permissible gradient and the maximum time
of running. For preventing the possibility of the occur-
3.5. Building ANNs model
rence of memorization, calibration was performed. This
The Artificial Neural Networks model was constructed calibration serves as a parameter that signifies that the
using the SNN (i.e. STATISTICA Neural Networks). The network has been trained sufficiently and subsequently
technical steps in constructing and applying of the ANNs ending the process of iteration. When the training of the
model are different for different tools used in the con- network has been completed satisfactorily, it is further
struction of the ANNs model. examined against a collection of cases that were not used
Using SNN, the technical steps involved various pro- during the training period.
cedures. First of all, the SNN data matrix was supplied The ANN models have been built based on several
to train the network by ‘importing’ or by using the data input combination. The first model was based on all input
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS 757

Table 5. Values of regression statistics for ANNs model with all input parameters.
Regression statistics All model data Training data set Calibration data set Test data set
Average value (Data Mean) 50.82 50.54 51.54 50.66
Standard deviation (S.D) 14.25 14.64 13.62 14.07
Average error (Error Mean) 0.12 0.08 −0.15 0.46
Standard deviation of errors (Error S.D) 2.22 2.09 2.48 2.15
Average absolute error (Abs E. Mean) 1.29 1.28 1.36 1.23
The error/data standard deviation ratio (S.D. Ratio) 0.16 0.14 0.18 0.15
The correlation coefficient (r) 98.78% 98.97% 98.33% 98.83%
R2 97.57% 97.95% 96.69% 97.67%

parameters while other models based on only five inputs is an excellent performance indicator. It also provides
by suggesting that one of the inputs is missing each time. a well-known and simple means to evaluate the per-
Therefore, five inputs scenarios will be tested. formance of NNs with usual least squares linear fitting
The outcomes are then provided using a statistical rep- methods. The below given Table 5 shows the regression
resentation. Regression analysis was used to measure the statistics values for the ANNs model (Sanikhani et al.,
extent of correlation between the network output and the 2019; Yaseen et al., 2018).
actual output. Correlation factor (r) of 1 indicates that it The low value of S.D. Ratio, Error Mean and Abs E.
is an ideal model while a value of 0 suggests that it is an Mean signifies that the inaccuracy between simulated
extremely bad model. Mathematically, the (r) values can and observed WQI values obtained by the ANNs model
be represented in Equation (2) given below. is not too high. The correlation coefficient’s (r) high value
 n  signifies that the simulated values of WQI obtained by

2
R =1− (actuali − predictedi )2 using the ANNs model agree well with the observed
i=1
values of WQI.
 n  The correlation coefficient (r) between the observed

(actuali − mean)2 (2) and the predicted ANNs model output values is 98.78%.
i=1 The correlation coefficient’s (r) high value signifies that
the simulated values of WQI obtained by using the ANNs
After several trials, MLP was decided as the most opti- model agree well with the observed values of WQI, a fact
mal neural network which has three layers: an input layer that gave an initial idea that ANNs model are efficient
having 6 neurons, a hidden layer having seven neurons and practical. Figure 3 shows a comparison of observed
and lastly the output layer with one neuron. The six input values of WQI and simulated WQI values obtained by
neurons are: COD, BOD, DO, AN, SS and pH. The output applying ANNs for all the modeling data. The observed
neuron represents WQI. Additional performance indica- and predicted WQI, seem to be in good accord with R2
tors, like MAE (i.e. mean absolute error), ABS (i.e. max- 0.9757.
imum and minimum absolute error), RMSE (i.e. root The correlation has been tested for all station which
mean square error), NRMSE (i.e. normalized root mean represent all range of model data from the period of Jan-
square error) have been used to assess the accuracy of the uary 1999 to June 1999 as shown in Figure 4. In addition,
prediction of every model for better comparison. the comparison of simulated and the observed data for
Station St.05 from 1997 to 2001 has been presented in
4. Results and discussion Figure 5. It can be seen clearly that the results showed
a high correlation between the observed and simulated
4.1. Regression statistics of ANNs model WQI in both Figures 4 and 5. It is clear that ANNs model
In problems of regression, the aim of the NN is to achieved high performance through the high similarity
learn a plotting from the input parameters to an output rates, which observed over the complete period and for
parameter. A network is efficient at regression in case it all stations data of model in Figures 4 and 5.
makes satisfactorily accurate predictions. SNN automat- Since that used data in modeling are from year 1997
ically computes the correlation coefficient (r) between to 2005, the ANNs model is tested using the data of year
the predicted and the actual outputs. An ideal predic- 2006. Comparison of simulated WQI and the observed
tion is one with a correlation coefficient of value 1. A WQI for all stations in year 2006 is presented in Figure 6.
correlation having value 1 does not necessarily suggest The observed and predicted WQI, seem to be in
an absolute prediction (i.e. it only signifies a prediction good accord with R2 0.9786. The correlation between
that has an absolute linear correlation with actual out- the simulated and observed output values in year 2006
puts); even though in fact the coefficient of correlation is 98.92%. It is very high correlation and indicates that
758 F. OTHMAN ET AL.

Figure 3. Correlation between the simulated and the observed WQI for all modeling data.

Figure 4. (a) Comparison of the simulated WQI and the observed WQI for all stations in 1999 from January to June (b) Correlation
between the simulated WQI and the observed WQI for all stations in 1999 from January to June.
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS 759

Figure 5. (a) Comparison of the simulated WQI and the observed WQI for station St.05 from 1997 to 2001 (b) Correlation between the
simulated WQI and the observed WQI for station St.05 from 1997 to 2001.

the ANNs model is useful and applicable to calculate available for the model, so the most significant param-
WQI. eter is DO as it has the biggest ratio 2.99 while pH is not
so significant as the ratio is 1.01, implying that almost no
4.2. Sensitivity analysis decline in the performance of the model will take place
even if this parameter is not available to the model any-
Sensitivity assessment is used as an indicator of which
more. The importance other variables are presented in
input parameters are regarded as most important. It is
Table 6.
used only for informative purposes. It can give significant
insights into the effectiveness of the individual parame-
ters. It rates the parameters according to the decline in
4.3. ANNs models for input scenarios
the performance of the model which takes place if that
parameter is no longer useful to the model. In doing From the sensitivity analysis, the input parameters show
this, it allocates a single rating value to every parameter. a fluctuation in the importance to the WQI estimation.
Nonetheless, the interdependence among the parameters If one of these parameters is missing, the calculation of
signifies that no policy of single ratings for each parame- WQI is impossible task by using the WQI equation, there-
ter ever reflects the intricacy of the actual situation. fore, a five ANNs model based on five input scenarios
The ratio given in Table 4 indicates the decline in the are presented. For each scenario one variable is miss-
performance of the model when that parameter is not ing. Table 7 is presenting the performance indicator for
760 F. OTHMAN ET AL.

Figure 6. (a) Comparison of the simulated WQI and the observed WQI in 2006 (b) Correlation between the simulated WQI and the
observed WQI in 2006

Table 6. Values of regression statistics for final ANNs model. the modeling performance when AN is missing from
Var. DO BOD COD SS pH AN the input parameters. The highest value of r and the
Ratio 2.99 2.11 1.59 1.62 1.01 2.13 lowest value of all other indicators are presenting the
Rank 1 3 5 4 6 2 highest accuracy of modeling. From Table 7, can con-
clude that the model with missing of DO parameter has
lowest accuracy of WQI estimation based on all perfor-
six scenarios. It is clearly the results show that the best mance indicators. Meanwhile, the best performance for
performance for the model when use all input vari- five input variables is seen when PH is missing. There-
ables which is can be consider as benchmark to com- fore, it can be concluded that WQI can be estimated
pare with other scenarios results. The second presents using ANN model with the missing of pH. These results

Table 7. The performance indicator for all input scenarios.


Performance indicators All Inputs AN BOD COD DO PH SS
RMSE 1.746 3.131 3.638 2.607 6.306 2.296 3.153
NRMSE 0.020 0.035 0.042 0.030 0.072 0.026 0.036
MAE 1.218 2.245 2.852 1.954 5.053 1.553 2.441
NMAE 0.014 0.026 0.033 0.022 0.058 0.018 0.028
Min Abs Error 0.0003 0.001 0.003 0.006 0.005 0.0004 0.004
Max Abs Error 17.867 20.813 21.065 20.138 22.648 17.920 18.136
r 0.988 0.976 0.967 0.980 0.912 0.983 0.973
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS 761

from ANN models are similar to the sensitivity analysis Ali Najah Ahmed http://orcid.org/0000-0002-5618-6663
that shows the importance of each model as shown in Haitham Abdulmohsin Afan http://orcid.org/0000-0002-
Table 6. 4957-756X
Fang Yenn Teo http://orcid.org/0000-0002-5529-1381
Chow Ming Fai http://orcid.org/0000-0002-0732-5575
5. Conclusions Mohsen Sherif http://orcid.org/0000-0002-6368-8143
In this work, an artificial neural network model has been Ahmed Sefelnasr http://orcid.org/0000-0002-0281-6455
formulated for estimating the WQI of Klang River on Ahmed El-Shafie http://orcid.org/0000-0001-5018-8505
the basis of various numbers of water quality parameters:
BOD, DO, AN, COD, SS and pH. The methodology has
proven it ability to estimate WQI with high accuracy in References
the condition of one of the parameters is missing.
Adamowski, J., & Sun, K. (2010). Development of a coupled
The new model delivered quite good outcomes wavelet transform and neural network method for flow fore-
depending on the high correlation between the noted and casting of non-perennial rivers in semi-arid watersheds.
projected values of WQI. The correlation coefficient (r) Journal of Hydrology, 390(1–2), 85–91. https://doi.org/10.
between the noted and the output computed values of 1016/j.jhydrol.2010.06.033
the ANNs model is 98.78%. The high value of correlation Alizadeh, M. J., Kavianpour, M. R., Danesh, M., Adolf, J.,
Shamshirband, S., & Chau, K. W. (2018). Effect of river flow
coefficient (r) indicated that the computed WQI values on the quality of estuarine and coastal waters using machine
by utilizing the ANNs model were in quite good accord learning models. Engineering Applications of Computational
with the noted WQI. This indicates that the ANNs model Fluid Mechanics, 12(1), 810–823. https://doi.org/10.1080/
is beneficial and pertinent. 19942060.2018.1528480
The ANNs model results and the sensitivity analysis Anctil, F., Perrin, C., & Andréassian, V. (2004). Impact of the
length of observed records on the performance of ANN and
proved that the most important variable that affects WQI
of conceptual parsimonious rainfall-runoff forecasting mod-
is the DO and pH is less important variable. In spite els. Environmental Modelling and Software, 19(4), 357–368.
of the deterioration of the accuracy for estimating the https://doi.org/10.1016/S1364-8152(03)00135-X
WQI when DO is missing but it is still with acceptable Atasoy, M., Palmquist, R. B., & Phaneuf, D. J. (2006). Estimating
limits. For more accurate prediction of WQI, an ANN the effects of urban residential development on water qual-
hybrid models or advanced AI models can be employed ity using microdata. Journal of Environmental Management,
79(4), 399–408. https://doi.org/10.1016/j.jenvman.2005.
for future research. 07.012
Avvannavar, S. M., & Shrihari, S. (2008). Evaluation of
Acknowledgments water quality index for drinking purposes for river Netra-
vathi, Mangalore, South India. Environmental Monitoring
We would like to express our thanks to the Department of and Assessment, 143(1–3), 279–290. https://doi.org/10.1007/
Irrigation and Drainage, and the Department of Environ- s10661-007-9977-7
ment, Malaysia for their co-operation in performing this study. Bradley, R. M. (2010). Direct and indirect benefits of improv-
We would also like to extend our gratitude to the University ing river quality: Quantifying benefits and a case study
of Malaya Research Grants (RU001-2017C, GPF070A-2018, of the River Klang, Malaysia. The Environmentalist, 30(3),
RF015B-2018) and 2020106TELCO grant by the Innovation 228–241. https://doi.org/10.1007/s10669-010-9267-8
& Research Management Center (iRMC), Universiti Tenaga Caddis, Ben, Nielsen, Chris, Hong, Wedge, Anun Tahir, Pari-
Nasional (UNITEN) for providing the financial support for this dah, & Yenn Teo, Fang. (2012). Guidelines for flood-
study. plain development – a Malaysian case study. Interna-
tional Journal of River Basin Management, 10(2), 161–170.
Disclosure statement 10.1080/15715124.2012.688750
Directors, A. D. B. B. of. (2002). Report and recommendation
No potential conflict of interest was reported by the author(s). of the president to the Board of Directors on a proposed loan
to Mongolia for the integrated development of basic urban
Funding services in provincial towns project (Vol. 31243). Asian Devel-
opment Bank.
This work was supported by Universiti Malaya: [Grant Num- Elzwayie, A., El-shafie, A., Yaseen, Z. M., Afan, H. A.,
bers RU001-2017C, GPF070A-2018, RF015B-2018]; Universiti & Allawi, M. F. (2017). RBFNN-based model for heavy
Tenaga Nasional: [Grant Number 2020106TELCO]. metal prediction for different climatic and pollution condi-
tions. Neural Computing and Applications, 28(8), 1991–2003.
ORCID https://doi.org/10.1007/s00521-015-2174-7
Ghorbani, M. A., Khatibi, R., Karimi, V., Yaseen, Z. M., &
Faridah Othman http://orcid.org/0000-0002-4952-3676 Zounemat-Kermani, M. (2018). Learning from multiple
M.E. Alaaeldin http://orcid.org/0000-0002-1561-297X models using artificial intelligence to improve model predic-
Mohammed Seyam http://orcid.org/0000-0002-2521-6508 tion accuracies: Application to river flows. Water Resources
762 F. OTHMAN ET AL.

Management, 32(13), 4201–4215. https://doi.org/10.1007/ May, R. J., Dandy, G. C., Maier, H. R., & Nixon, J. B. (2008).
s11269-018-2038-x Application of partial mutual information variable selection
Hameed, M., Sharqi, S. S., Yaseen, Z. M., Afan, H. A., Hussain, to ANN forecasting of water quality in water distribution
A., & Elshafie, A. (2017). Application of artificial intelli- systems. Environmental Modelling & Software, 23(10–11),
gence (AI) techniques in water quality index prediction: A 1289–1299. https://doi.org/10.1016/j.envsoft.2008.03.008
case study in tropical region, Malaysia. Neural Computing Mei, K., Zhu, Y., Liao, L., Dahlgren, R., Shang, X., &
and Applications, 28(1), 893–905. https://doi.org/10.1007/ Zhang, M. (2011). Optimizing water quality monitor-
s00521-016-2404-7 ing networks using continuous longitudinal monitoring
Heddam, S., & Kisi, O. (2018). Modelling daily dissolved data: A case study of Wen-Rui Tang River, Wenzhou,
oxygen concentration using least square support vec- China. Journal of Environmental Monitoring, 13(10), 2755.
tor machine, multivariate adaptive regression splines and https://doi.org/10.1039/c1em10352k
M5 model tree. Journal of Hydrology, 559, 499–509. Olyaie, E., Banejad, H., Chau, K.-W., & Melesse, A. M.
https://doi.org/10.1016/j.jhydrol.2018.02.061 (2015). A comparison of various artificial intelligence
Ho, J. Y., Afan, H. A., El-Shafie, A. H., Koting, S. B., Mohd, N. approaches performance for estimating suspended sedi-
S., Jaafar, W. Z. B., Lai Sai, H., Malek, M. A., Ahmed, A. N., ment load of river systems: A case study in United States.
Mohtar, W. H. M. W., Elshorbagy, A., & El-Shafie, A. (2019). Environmental Monitoring and Assessment, 187(4), 189.
Towards a time and cost effective approach to water quality https://doi.org/10.1007/s10661-015-4381-1
index class prediction. Journal of Hydrology, 575, 148–165. Othman, F., M E, A. E., & Mohamed, I. (2012). Trend anal-
https://doi.org/10.1016/j.jhydrol.2019.05.016 ysis of a tropical urban river water quality in Malaysia.
Jiang, H., & Cotton, W. R. (2004). Soil moisture esti- Journal of Environmental Monitoring, 14(12), 3164–3173.
mation using an artificial neural network: A feasibility https://doi.org/10.1039/c2em30676j
study. Canadian Journal of Remote Sensing, 30(5), 827–839. Rubio-Arias, H., Contreras-Caraveo, M., Quintana, R. M.,
https://doi.org/10.5589/m04-041 Saucedo-Teran, R. A., & Pinales-Munguia, A. (2012).
Karimi, S., Kisi, O., Shiri, J., & Makarynskyy, O. (2013). Neuro- An overall water quality index (WQI) for a man-made
fuzzy and neural network techniques for forecasting sea level aquatic reservoir in Mexico. International Journal of Envi-
in Darwin Harbor, Australia. Computers and Geosciences, 52, ronmental Research and Public Health, 9(5), 1687–1698.
50–59. https://doi.org/10.1016/j.cageo.2012.09.015 https://doi.org/10.3390/ijerph9051687
Kido, M., Yustiawati, S., Sulastri, M. S., Hosokawa, T., Tanaka, Sanikhani, H., Kisi, O., Maroufpoor, E., & Yaseen, Z. M. (2019).
S., Saito, T., Iwakuma, T., & Kurasaki, M. (2009). Com- Temperature-based modeling of reference evapotranspira-
parison of general water quality of rivers in Indonesia and tion using several artificial intelligence models: Applica-
Japan. Environmental Monitoring and Assessment, 156(1-4), tion of different modeling scenarios. Theoretical and Applied
317–329. https://doi.org/10.1007/s10661-008-0487-z Climatology, 135(1-2), 449–462. https://doi.org/10.1007/s00
Letcher, R. A., Croke, B. F. W., & Jakeman, A. J. (2007). 704-018-2390-z
Integrated assessment modelling for water resource allo- Sepahvand, A., Singh, B., Sihag, P., & Samani, A. N. (2019).
cation and management: A generalised conceptual frame- Assessment of the various soft computing techniques to pre-
work. Environmental Modelling & Software, 22(5), 733–742. dict sodium absorption ratio (SAR). ISH Journal of Hydraulic
https://doi.org/10.1016/j.envsoft.2005.12.014 Engineering, 5010. https://doi.org/10.1080/09715010.2019.
Liu, W. C., & Chen, W. B. (2012). Prediction of water tem- 1595185
perature in a subtropical subalpine lake using an artificial Sharma, A., Naidu, M., & Sargaonkar, A. (2013). Development
neural network and three-dimensional circulation models. of computer automated decision support system for sur-
Computers & Geosciences, 45, 13–25. face water quality assessment. Computers & Geosciences, 51,
Luo, P., He, B., Takara, K., Razafindrabe, B. H. N., Nover, D., & 129–134. https://doi.org/10.1016/j.cageo.2012.09.007
Yamashiki, Y. (2011). Spatiotemporal trend analysis of recent Stambuk-Giljanovic, N. (1999). Water quality evaluation by
river water quality conditions in Japan. Journal of Envi- index in Dalmatia. Water Research, 33(16), 3423–3440.
ronmental Monitoring, 13(10), 2819. https://doi.org/10.1039/ https://doi.org/10.1016/s0043-1354(99)00063-9
c1em10339c Sudheer, K. P., Gosain, A. K., Mohana Rangan, D., & Saheb,
Maier, H. R., Jain, A., Dandy, G. C., & Sudheer, K. P. S. M. (2002). Modelling evaporation using an artificial
(2010). Methods used for the development of neural neural network algorithm. Hydrological Processes, 16(16),
networks for the prediction of water resource variables 3189–3202. https://doi.org/10.1002/hyp.1096
in river systems: Current status and future directions. Tao, H., Bobaker, A. M., Ramal, M. M., Yaseen, Z. M., Hos-
Environmental Modelling and Software, 25(8), 891–909. sain, M. S., & Shahid, S. (2019). Determination of biochem-
https://doi.org/10.1016/j.envsoft.2010.02.003 ical oxygen demand and dissolved oxygen for semi-arid
Makarynska, D., & Makarynskyy, O. (2008). Predicting sea- river environment: Application of soft computing models.
level variations at the Cocos (Keeling) Islands with artifi- Environmental Science and Pollution Research, (Zolnikov
cial neural networks. Computers and Geosciences, 34(12), 2013) 26(1), 923–937. https://doi.org/10.1007/s11356-018-
1910–1917. https://doi.org/10.1016/j.cageo.2007.12.004 3663-x
Malik, A., Kumar, A., Kisi, O., & Shiri, J. (2019). Evaluating Tung, T. M., & Yaseen, Z. M. (2020). A survey on river water
the performance of four different heuristic approaches with quality modelling using artificial intelligence models: 2000-
Gamma test for daily suspended sediment concentration 2020. Journal of Hydrology, 124670. https://doi.org/10.1016/
modeling. Environmental Science and Pollution Research, j.jhydrol.2020.124670.
26(22), 22670–22687. https://doi.org/10.1007/s11356-019- Wan Mohtar, W. H. M., Abdul Maulud, K. N., Muham-
05553-9 mad, N. S., Sharil, S., & Yaseen, Z. M. (2019). Spatial
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS 763

and temporal risk quotient based river assessment for Yaseen, Z. M., Sulaiman, S. O., Deo, R. C., & Chau, K.-
water resources management. Environmental Pollution, 248, W. (2019). An enhanced extreme learning machine model
133–144. https://doi.org/10.1016/j.envpol.2019.02.011 for river flow forecasting: State-of-the-art, practical appli-
Yaseen, Z. M., Awadh, S. M., Sharafati, A., & Shahid, cations in water resource engineering area and future
S. (2018). Complementary data-intelligence model for research direction. Journal of Hydrology, 569, 387–408.
river flow simulation. Journal of Hydrology, 567, 180–190. https://doi.org/10.1016/j.jhydrol.2018.11.069
https://doi.org/10.1016/j.jhydrol.2018.10.020

You might also like