East Coast Environmental Research Institute, University Sultan Zainal Abidin, Malaysia
Abstract Despite the importance of groundwater in Terengganu, Malaysia, quality assessment has received little
attention, and effort to use hydrochemistry data to solve particular problems are even fewer or non-existent. This paper,
reports results from large hydrochemistry data analysed using multivariate statistical techniques such as Cluster Analysis
(CA), Discriminant Analysis (DA) and Principal Component Analysis (PCA) with the objectives of determining the spatial
variability of groundwater and to identify the sources of pollution that presently affects the groundwater. The water quality
data was monitored at ten different wells, over the period of six years (2006-2011) using 24 water quality parameters. The CA
allowed the formation of three clusters between the sampling wells reflecting differences on water quality at different
locations. DA as a data reduction techniques was used to evaluate spatial variability in water quality, as it uses only 3
parameters (Ca+, NO2, and PH) affording 73.33% correct assignation to discriminate between the clusters using forward
stepwise mode from the original 24 parameters, while backward stepwise mode yielded 83.33% correct assignation to
discriminate nine parameters (Ca+, Mg2+, Fe2+, SO4-, Cl-, AS, Mn, NO2, and conductivity). PCA was used to examine the root
of each water quality parameter due to nature and anthropogenic activities based on the three cluster regions. It identified
eight PC’s, responsible for 76.45% of the total variance in the data set. The main factors obtained indicate that parameters
influencing groundwater quality of the clusters are mainly related to natural (dissolution of soil and rocks), pointsource
(municipal wastewater and industries) and non-point source pollution (agriculture) in the region. The results of this study
clearly demonstrate the usefulness of multivariate statistical techniques in Geochemistry.
Keywords Multivariate statistical techniques, Cluster analysis, Discriminate analysis, Principal component analysis
2.2. Data Collection and Treatment its fellow member than to any member outside the group
Secondary data were used in this research work. The water (Guler et al, 2002).
quality data in this study were obtained from ten monitoring 2.3.2. Discriminant Analysis
wells by the department of mineral and geosciences,
Terengganu. Each of the ten monitoring wells were observed The main objective of DA is to discriminate between two
and identified based on the availability of recorded data from or more groups in term of the discriminating variables. It was
the period of 2006-2011. The ten wells are: PT002, PT017, performed on the data set based on three different modes, i.e.
PT021, PT116, PT117, PT123, PT164, PT267, PT284, and Standard mode, forward stepwise and backward stepwise
PT300. Even though there are 50 water quality parameters modes to construct the best discriminant functions (DFs) to
but only 24 consistently sampled parameters were selected confirm the three clusters determined by means of CA and to
and a total of 60 samples and 1440 observation were used for evaluate spatial variation in portable water quality in
the analysis. Terengganu, Malaysia. In forward stepwise mode, variables
The water quality data obtained from the department of are included step-by-step beginning with the more
mineral and geosciences is in note pad format, it was then significant until no changes are obtained, whereas, in
later converted into Microsoft Excel 2007 for all backward stepwise mode, variables are removed
groundwater quality parameters. The monitoring wells were step-by-step beginning with less significant until significant
also sorted out (A-Z) in the normalised data set, while changes are obtained. The membership of a well ina cluster1,
non-numerical variables were also transformed into 2 and 3 was the dependent variables whereas all the
numerical variables for convenient analysis. All the measured parameters constituted the independent variables.
statistical analyses were performed using Microsoft excel
2007 and XLSTAT 2014 versions. 2.3.3. Principal Component Analysis
PCA analysis was used as a method of factor extraction,
2.3. Analytical Methods
for this study it requires a preceding estimate of the amount
Environmetric method is deemed to be the best approach of variation in each groundwater quality parameter explained
to avoid misinterpretation of large complex environmental by the factors. Eigenvalues are the amount of variance
monitoring data (Simeonov et al, 2002). The most common explained by each factor; each parameter had a variance of 1
environmetric methods used to determine the spatial with a total variance of 24 for the entire data set. Factor with
variability and to identify the pollution sources are Cluster eigenvalue >1 explained more total variation in the data than
Analysis (CA), Discriminant Analysis (DA), and Principal individual groundwater quality parameters, and factor with
Component Analysis (PCA). eigenvalue <1 explain less total variation than individual
variable, Therefore only factor with eigenvalue >1 were
2.3.1. Cluster analysis retained for the interpretation, retained factors were
This is a group of multivariate techniques which primarily subjected to varimax rotation (Kaiser; 1960 and Vega et al;
classify (Massart and Kaufmann, 1983) variables or cases 1998).
(observation or samples) into cluster with high homogeneity Varimax rotation is an orthogonal rotation method that
level within the class and high heterogeneity level between minimized the number of variables that have high loading on
classes. The spatial variability of groundwater was determine each factor. The VF coefficient having correlation greater
by CA. CA was first performed to group all sample site in than 0.75 are considered as strong and indicate high
order to classify them into cluster to minimized their number. proportion of its variance explained by the factor, between
We use CA to link sample site in the configuration of a tree 0.50 and 0.75 is considered as moderate loading while
with different branches (Dendogram) which provide visual 0.30-0.50 as weak significant factor loading, indicating
summary of the clustering process, presenting a picture of much of that attribute’s variance remains unexplained and it
the group and their proximity. Branches that have linkage is less important (Reghunath, et al 2002).
closer to each other indicate a stronger relationship between
sample/variables or cluster of sampling site/variable.
In this present study, CA was applied for the grouping of 3. Results and Discussion
ten different wells using ward’s linkage method (Ward 1963).
3.1. Descriptive Statistics
A classification scheme using Euclidean distance (straight
line distance between two point in C-dimensional space Basic statistics were carried out in order to give initial
define by C variable) for similarity measurement together information about the water quality data. The table below
with Ward method for linkage produces the most distinctive shows the details of descriptive statistics on the water quality
groups where each member within groups is more similar to variables measured in six years.
Cluster1 includes four wells (PT002, PT017, and Three parameters were found to be the most significant
PT021and PT164) classified as less polluted (LP) wells. variable that best discriminate the clusters (Ca+, NO2_ and
Cluster 2 includes four wells (PT116, PT117, PT123 and PH), which means that these three parameters account for the
PT267) as moderately polluted (MP) and cluster 3 contained most expected spatial variation in the groundwater quality.
2 wells (PT284 and PT300) as highly polluted (HP). The Backward stepwise mode on the other hand resulted several
clustering of wells indicates that water quality of parameters (Ca+, Mg2+, Fe2+, SO4_, Cl_, As, Mn, NO2 and
groundwater is varied smoothly and such variation is likely Conductivity) to discriminate the three clusters. Forward
due to the natural hydrogeological environment and the stepwise mode DA was proven to be a useful tool in
multipurpose nature of the study area. Omo-Irabor et al recognising the discriminant parameters in spatial variation
(2008) also suggest that the multipurpose nature of land use of portable water quality; this is because in forward stepwise
and their effects on groundwater quality hamper the precise mode, variables are included step by step beginning with the
spatial classification of monitoring sampling wells. The more significant variables until no changes are obtained. The
outcome indicates that for rapid evaluation of groundwater spatial DA suggest that calcium, nitrite, and PH were the
quality, onlyone well in each cluster is needed to represent a most significant parameters for discriminating among the
logical, accurate spatial assessment of the water quality for cluster yielded by CA, accounted for most of the expected
the whole network. The CA techniques shorten the need for spatial variation in portable water quality. Thus, DA is a
numerous sampling stations, monitoring from three method that can determine the classification into
monitoring wells that represent three different regions is predetermined group.
sufficient. Figure 2; shows the three regions given by CA and
its possible pollution sources within the study area. 3.4. Principal Component Analysis
PCA is performed on the normalized data set (24
3.3. Discriminant Analysis parameters) to identify the major variables affecting
In order to determine the spatial variation of groundwater groundwater quality. Factor with eigenvalue of 1.0 or greater
quality among different wells, DA was employed and it was are considered significant and factor with highest
performed using original data of 24 parameters after Eigenvalues are the most significant (Kim and Mueller, 1987)
classification into three major clusters obtained from the CA. and are retained in order to understand the underlying data
Cluster groups (LP, MP and HP) were run as dependent structure (Jackson, 1991) which has expressed that the
variables, while water quality parameters were treated as selected PCs are able to carry more information than a single
independent variables. DA was carried out via standard original variables.
mode, forward stepwise and backward stepwise modes, the Eight major PCs were extracted which accounted 76.45%
accuracy of spatial classification using standard, forward variance of the original data structure. The result of the PCs
stepwise, backward stepwise modes discriminate functions is given in the table 3.
were 90.00%,73.33% and 83.33% respectively.
Table 2. Classification matrix for DA of spatial variation of the groundwater in Terengganu
Region assigned by DA
Sampling regions %Correct HCL LCL MCL Total
Standard mode
HCL 91.67% 11 0 1 12
LCL 91.67% 0 22 2 24
MCL 87.50% 0 3 21 24
Total 90.00% 11 25 24 60
Forward stepwise
HCL 58.33% 7 0 5 12
LCL 83.33% 0 20 4 24
MCL 70.83% 0 7 17 24
Total 73.33% 7 27 26 60
Backward stepwise
HCL 66.67% 8 0 4 12
LCL 87.50% 0 21 3 24
MCL 87.50% 0 3 21 24
Total 83.33% 8 24 28 60
PC1 accounts for 26.29% of the total variance, showing Mn, however, the activities is also controlled by the redox
strong positive loading on Ca+, Mg2+, HCO3_, CL_, Dissolve level of groundwater.
Solid and conductivity, while moderate positive loading on Additionally, 7.78% of the total variance of water quality
sodium and a weak positive loading on fluorine and PH. The is exhibited by SO4 and As with a strong positive loading
high loading factor of conductivity is due to the active under PC4. Dissolution of gypsum and sodium sulphate
participation of dissolve ions in the groundwater quality. The mineral could increase SO4 concentration in groundwater. In
major variables constituting PC1 (Ca+, Mg2+, HCO3_, Na+) is general SO42_ content is low in groundwater and reveals the
related to the hydro chemical variables originating from higher level of groundwater reducing condition. Moreover,
mineralization of groundwater. The presence of Cl_ may also the released of As is reported from the natural source under
be an indicator of point source pollution by urban waste the reducing groundwater environment (Chapagain et al,
water discharge, while PH is related to municipal waste. 2009).
PC2 accounted11.56% of the total variance and it is PC5 explained 6.39% of the total variance of water quality
mainly participated by NH4 with strong positive loading and in groundwater, with a strong positive loading on K+ and
weak positive loading on Phosphorous.NH4 is closely related moderate positive loading on Na+. Association between K+
to the organic matter contents of the sediment and this high and Na+ suggest the dissolution of calcite and dolomite
amount of nutrients might also result from the application of affected by erosion and deposition from upland area. K+ can
manure in agricultural activities (Terceiro et al, 2008). be enriched in natural water due to the weathering of igneous
Out of the total variance, 9.63% is explained by PC3 and is rock and magmatic rocks. In the weathering of igneous rock,
mainly carried by Mn. The dissolution and weathering potassium feldspars are usually the main source of K+ ion.
process of the mineral is mainly responsible for the release of PC6, explaining 5.85% of the total variance has a strong
negative loading on NO3_ and is difficult to interpret. There usefulness of multivariate statistical analysis in
are two possible explanations for this negative relation. First, geochemistry. Additionally this result may be used to reduce
the negative correlation with NO3_ indicates that the number of samples analysed both in space and time
concentration of NO3_ is the result of different pollution without much loss of information. This will assist the
process involving industrial and municipal waste water decision makers to identify priorities to improve water
(Kennel et al, 2008), Fertilizer and the application of quality that has deteriorated due to pollution from various
agricultural pesticides (Koh et al, 2010, Shrestha and anthropogenic activities.
Kazama, 2007). Kaown et al (2009) also showed that
mineralisation of organic N fertilizer was dominant source
for nitrate in groundwater. Second, this factor can be ACKNOWLEDGEMENTS
interpreted as denitrification and nitrate reduction combined
with other geochemical process (Levins and Gosk, 2008). Authors are thankful to the Department of Mineral and
PC7 and PC8 explained 4.59 and 4.34% of the total Geoscience Terengganu for providing the hydrochemical
variance of water quality in groundwater respectively. PC7 is data of groundwater. They are also thankful to the post
mainly showed strong variation by colour, whereas, CO3 has graduate school, Universiti Sultan ZainalAbidin for
carried major variation of water quality under PC8. Perhaps supplying vehicle for site visit and data collection.
the most cause of groundwater colour is the presence of
minerals and organic matter. Red and brown colour is due to
iron; black to manganese or organic matter and yellow to
Niger Delta, Nigeria. Phys Chem Earth, Parts A/B/C [25] WHO (1997), World Health Organisation. Guidelines for
33(8–13): 666–673. drinking water quality, Geneva. Second Edition, Volume 3.