Pcahcapaper
Pcahcapaper
Pcahcapaper
DOI: 10.5923/j.scit.20140403.02
East Coast Environmental Research Institute, University Sultan Zainal Abidin, Malaysia
Abstract Despite the importance of groundwater in Terengganu, Malaysia, quality assessment has received little
attention, and effort to use hydrochemistry data to solve particular problems are even fewer or non-existent. This paper,
reports results from large hydrochemistry data analysed using multivariate statistical techniques such as Cluster Analysis
(CA), Discriminant Analysis (DA) and Principal Component Analysis (PCA) with the objectives of determining the spatial
variability of groundwater and to identify the sources of pollution that presently affects the groundwater. The water quality
data was monitored at ten different wells, over the period of six years (2006-2011) using 24 water quality parameters. The CA
allowed the formation of three clusters between the sampling wells reflecting differences on water quality at different
locations. DA as a data reduction techniques was used to evaluate spatial variability in water quality, as it uses only 3
parameters (Ca+, NO2, and PH) affording 73.33% correct assignation to discriminate between the clusters using forward
stepwise mode from the original 24 parameters, while backward stepwise mode yielded 83.33% correct assignation to
discriminate nine parameters (Ca+, Mg2+, Fe2+, SO4-, Cl-, AS, Mn, NO2, and conductivity). PCA was used to examine the root
of each water quality parameter due to nature and anthropogenic activities based on the three cluster regions. It identified
eight PC’s, responsible for 76.45% of the total variance in the data set. The main factors obtained indicate that parameters
influencing groundwater quality of the clusters are mainly related to natural (dissolution of soil and rocks), pointsource
(municipal wastewater and industries) and non-point source pollution (agriculture) in the region. The results of this study
clearly demonstrate the usefulness of multivariate statistical techniques in Geochemistry.
Keywords Multivariate statistical techniques, Cluster analysis, Discriminate analysis, Principal component analysis
2.2. Data Collection and Treatment its fellow member than to any member outside the group
Secondary data were used in this research work. The water (Guler et al, 2002).
quality data in this study were obtained from ten monitoring 2.3.2. Discriminant Analysis
wells by the department of mineral and geosciences,
Terengganu. Each of the ten monitoring wells were observed The main objective of DA is to discriminate between two
and identified based on the availability of recorded data from or more groups in term of the discriminating variables. It was
the period of 2006-2011. The ten wells are: PT002, PT017, performed on the data set based on three different modes, i.e.
PT021, PT116, PT117, PT123, PT164, PT267, PT284, and Standard mode, forward stepwise and backward stepwise
PT300. Even though there are 50 water quality parameters modes to construct the best discriminant functions (DFs) to
but only 24 consistently sampled parameters were selected confirm the three clusters determined by means of CA and to
and a total of 60 samples and 1440 observation were used for evaluate spatial variation in portable water quality in
the analysis. Terengganu, Malaysia. In forward stepwise mode, variables
The water quality data obtained from the department of are included step-by-step beginning with the more
mineral and geosciences is in note pad format, it was then significant until no changes are obtained, whereas, in
later converted into Microsoft Excel 2007 for all backward stepwise mode, variables are removed
groundwater quality parameters. The monitoring wells were step-by-step beginning with less significant until significant
also sorted out (A-Z) in the normalised data set, while changes are obtained. The membership of a well ina cluster1,
non-numerical variables were also transformed into 2 and 3 was the dependent variables whereas all the
numerical variables for convenient analysis. All the measured parameters constituted the independent variables.
statistical analyses were performed using Microsoft excel
2007 and XLSTAT 2014 versions. 2.3.3. Principal Component Analysis
PCA analysis was used as a method of factor extraction,
2.3. Analytical Methods
for this study it requires a preceding estimate of the amount
Environmetric method is deemed to be the best approach of variation in each groundwater quality parameter explained
to avoid misinterpretation of large complex environmental by the factors. Eigenvalues are the amount of variance
monitoring data (Simeonov et al, 2002). The most common explained by each factor; each parameter had a variance of 1
environmetric methods used to determine the spatial with a total variance of 24 for the entire data set. Factor with
variability and to identify the pollution sources are Cluster eigenvalue >1 explained more total variation in the data than
Analysis (CA), Discriminant Analysis (DA), and Principal individual groundwater quality parameters, and factor with
Component Analysis (PCA). eigenvalue <1 explain less total variation than individual
variable, Therefore only factor with eigenvalue >1 were
2.3.1. Cluster analysis retained for the interpretation, retained factors were
This is a group of multivariate techniques which primarily subjected to varimax rotation (Kaiser; 1960 and Vega et al;
classify (Massart and Kaufmann, 1983) variables or cases 1998).
(observation or samples) into cluster with high homogeneity Varimax rotation is an orthogonal rotation method that
level within the class and high heterogeneity level between minimized the number of variables that have high loading on
classes. The spatial variability of groundwater was determine each factor. The VF coefficient having correlation greater
by CA. CA was first performed to group all sample site in than 0.75 are considered as strong and indicate high
order to classify them into cluster to minimized their number. proportion of its variance explained by the factor, between
We use CA to link sample site in the configuration of a tree 0.50 and 0.75 is considered as moderate loading while
with different branches (Dendogram) which provide visual 0.30-0.50 as weak significant factor loading, indicating
summary of the clustering process, presenting a picture of much of that attribute’s variance remains unexplained and it
the group and their proximity. Branches that have linkage is less important (Reghunath, et al 2002).
closer to each other indicate a stronger relationship between
sample/variables or cluster of sampling site/variable.
In this present study, CA was applied for the grouping of 3. Results and Discussion
ten different wells using ward’s linkage method (Ward 1963).
3.1. Descriptive Statistics
A classification scheme using Euclidean distance (straight
line distance between two point in C-dimensional space Basic statistics were carried out in order to give initial
define by C variable) for similarity measurement together information about the water quality data. The table below
with Ward method for linkage produces the most distinctive shows the details of descriptive statistics on the water quality
groups where each member within groups is more similar to variables measured in six years.
Science and Technology 2014, 4(3): 42-49 45
Cluster1 includes four wells (PT002, PT017, and Three parameters were found to be the most significant
PT021and PT164) classified as less polluted (LP) wells. variable that best discriminate the clusters (Ca+, NO2_ and
Cluster 2 includes four wells (PT116, PT117, PT123 and PH), which means that these three parameters account for the
PT267) as moderately polluted (MP) and cluster 3 contained most expected spatial variation in the groundwater quality.
2 wells (PT284 and PT300) as highly polluted (HP). The Backward stepwise mode on the other hand resulted several
clustering of wells indicates that water quality of parameters (Ca+, Mg2+, Fe2+, SO4_, Cl_, As, Mn, NO2 and
groundwater is varied smoothly and such variation is likely Conductivity) to discriminate the three clusters. Forward
due to the natural hydrogeological environment and the stepwise mode DA was proven to be a useful tool in
multipurpose nature of the study area. Omo-Irabor et al recognising the discriminant parameters in spatial variation
(2008) also suggest that the multipurpose nature of land use of portable water quality; this is because in forward stepwise
and their effects on groundwater quality hamper the precise mode, variables are included step by step beginning with the
spatial classification of monitoring sampling wells. The more significant variables until no changes are obtained. The
outcome indicates that for rapid evaluation of groundwater spatial DA suggest that calcium, nitrite, and PH were the
quality, onlyone well in each cluster is needed to represent a most significant parameters for discriminating among the
logical, accurate spatial assessment of the water quality for cluster yielded by CA, accounted for most of the expected
the whole network. The CA techniques shorten the need for spatial variation in portable water quality. Thus, DA is a
numerous sampling stations, monitoring from three method that can determine the classification into
monitoring wells that represent three different regions is predetermined group.
sufficient. Figure 2; shows the three regions given by CA and
its possible pollution sources within the study area. 3.4. Principal Component Analysis
PCA is performed on the normalized data set (24
3.3. Discriminant Analysis parameters) to identify the major variables affecting
In order to determine the spatial variation of groundwater groundwater quality. Factor with eigenvalue of 1.0 or greater
quality among different wells, DA was employed and it was are considered significant and factor with highest
performed using original data of 24 parameters after Eigenvalues are the most significant (Kim and Mueller, 1987)
classification into three major clusters obtained from the CA. and are retained in order to understand the underlying data
Cluster groups (LP, MP and HP) were run as dependent structure (Jackson, 1991) which has expressed that the
variables, while water quality parameters were treated as selected PCs are able to carry more information than a single
independent variables. DA was carried out via standard original variables.
mode, forward stepwise and backward stepwise modes, the Eight major PCs were extracted which accounted 76.45%
accuracy of spatial classification using standard, forward variance of the original data structure. The result of the PCs
stepwise, backward stepwise modes discriminate functions is given in the table 3.
were 90.00%,73.33% and 83.33% respectively.
Table 2. Classification matrix for DA of spatial variation of the groundwater in Terengganu
Region assigned by DA
Sampling regions %Correct HCL LCL MCL Total
Standard mode
HCL 91.67% 11 0 1 12
LCL 91.67% 0 22 2 24
MCL 87.50% 0 3 21 24
Total 90.00% 11 25 24 60
Forward stepwise
HCL 58.33% 7 0 5 12
LCL 83.33% 0 20 4 24
MCL 70.83% 0 7 17 24
Total 73.33% 7 27 26 60
Backward stepwise
HCL 66.67% 8 0 4 12
LCL 87.50% 0 21 3 24
MCL 87.50% 0 3 21 24
Total 83.33% 8 24 28 60
Science and Technology 2014, 4(3): 42-49 47
PC1 accounts for 26.29% of the total variance, showing Mn, however, the activities is also controlled by the redox
strong positive loading on Ca+, Mg2+, HCO3_, CL_, Dissolve level of groundwater.
Solid and conductivity, while moderate positive loading on Additionally, 7.78% of the total variance of water quality
sodium and a weak positive loading on fluorine and PH. The is exhibited by SO4 and As with a strong positive loading
high loading factor of conductivity is due to the active under PC4. Dissolution of gypsum and sodium sulphate
participation of dissolve ions in the groundwater quality. The mineral could increase SO4 concentration in groundwater. In
major variables constituting PC1 (Ca+, Mg2+, HCO3_, Na+) is general SO42_ content is low in groundwater and reveals the
related to the hydro chemical variables originating from higher level of groundwater reducing condition. Moreover,
mineralization of groundwater. The presence of Cl_ may also the released of As is reported from the natural source under
be an indicator of point source pollution by urban waste the reducing groundwater environment (Chapagain et al,
water discharge, while PH is related to municipal waste. 2009).
PC2 accounted11.56% of the total variance and it is PC5 explained 6.39% of the total variance of water quality
mainly participated by NH4 with strong positive loading and in groundwater, with a strong positive loading on K+ and
weak positive loading on Phosphorous.NH4 is closely related moderate positive loading on Na+. Association between K+
to the organic matter contents of the sediment and this high and Na+ suggest the dissolution of calcite and dolomite
amount of nutrients might also result from the application of affected by erosion and deposition from upland area. K+ can
manure in agricultural activities (Terceiro et al, 2008). be enriched in natural water due to the weathering of igneous
Out of the total variance, 9.63% is explained by PC3 and is rock and magmatic rocks. In the weathering of igneous rock,
mainly carried by Mn. The dissolution and weathering potassium feldspars are usually the main source of K+ ion.
process of the mineral is mainly responsible for the release of PC6, explaining 5.85% of the total variance has a strong
48 Usman Nasiru Usman et al.: Assessment of Groundwater Quality Using
Multivariate Statistical Techniques in Terengganu
negative loading on NO3_ and is difficult to interpret. There usefulness of multivariate statistical analysis in
are two possible explanations for this negative relation. First, geochemistry. Additionally this result may be used to reduce
the negative correlation with NO3_ indicates that the number of samples analysed both in space and time
concentration of NO3_ is the result of different pollution without much loss of information. This will assist the
process involving industrial and municipal waste water decision makers to identify priorities to improve water
(Kennel et al, 2008), Fertilizer and the application of quality that has deteriorated due to pollution from various
agricultural pesticides (Koh et al, 2010, Shrestha and anthropogenic activities.
Kazama, 2007). Kaown et al (2009) also showed that
mineralisation of organic N fertilizer was dominant source
for nitrate in groundwater. Second, this factor can be ACKNOWLEDGEMENTS
interpreted as denitrification and nitrate reduction combined
with other geochemical process (Levins and Gosk, 2008). Authors are thankful to the Department of Mineral and
PC7 and PC8 explained 4.59 and 4.34% of the total Geoscience Terengganu for providing the hydrochemical
variance of water quality in groundwater respectively. PC7 is data of groundwater. They are also thankful to the post
mainly showed strong variation by colour, whereas, CO3 has graduate school, Universiti Sultan ZainalAbidin for
carried major variation of water quality under PC8. Perhaps supplying vehicle for site visit and data collection.
the most cause of groundwater colour is the presence of
minerals and organic matter. Red and brown colour is due to
iron; black to manganese or organic matter and yellow to
dissolved organic matter such as tannins. Natural processes REFERENCES
such as dissolution of carbonate mineral and dissolution of
atmospheric, and soil CO2 gas could be a mechanism [1] Adam M. J (1998). The principle of multivariate data analysis
in P. R Ashurst & M. J Dennis (Eds). Analytical methods of
supplying CO32_ to the groundwater. It can also be related to food authentication (p. 350) London: Blackie Academic
atmospheric pollution from gaseous emanation into the professional.
atmospheric from petroleum related industrial and vehicular
[2] Anders Berntell, World Water Week 2010, Stockholm
exhausts (Omo-Irabor et al, 2008).
International Water Institutes (SIWI), Stockholm, Sweden.
[3] Brown, S. D, Skogerboe, R. K, & Kowalski, B. R. (1980).
4. Conclusions Pattern recognition assessment of water quality data: Coal
strip mine drainage. Chemosphere, 9, 265-276. Doi: 10.1016/
The study has examined water quality of groundwater in 0045-6535(80)90003-X.
Terengganu, Malaysia. The groundwater is classified as HP, [4] Chapagain, S. K, Shrestha, S., Nakamura, T., Pandey, V. P.,
MP, and LP which was analysed using multivariate & Kazama, F. (2009). Arsenic occurrence in Groundwater of
statistical techniques to determine the spatial variability of Kathmandu Valley, Nepal. Desalination and Water Treatment.
groundwater and to identify major variables affecting the 4, 248-254.
water quality of groundwater. [5] Guler, C, Thyne, G. D, McCray, J. E, Turner, A. K (2002).
CA resulted in three main cluster of sampling site with Evaluation of graphical and multivariate statistical methods
different characteristics. Continuously, DA determined only for classification of water chemistry data. Hydrogeology J 10;
three parameters i.e. Ca+, NO2_ and PH affording 73.33% 455-474
correct assignation to discriminate between the clusters using [6] Jackson, J. E. (1991). A user’s guide to principal components.
forward stepwise mode from the original 24 parameters. New York: Wiley
Therefore, forward stepwise mode was proven to be useful in
[7] Juahir, H., Ekhwan, T. M, Zain, S. M, Mokhtar, M, Zaihan, J,
recognising the discriminate parameters in spatial variation
& Ijankhushaida, M. J (2008). The use of chemometrics
of portable water quality as it begins with more significant analysis as a cost-effective tool in sustainable utilization of
variables than backward stepwise mode. water resources in the Langat River Catchment.
PCA was used to examine the root of each water quality American-Eurasian journal of Agricultural & Environmental
parameter due to nature and anthropogenic activities based Sciences, 4(1), 258-265.
on three cluster regions. Eight varimax factors (VFs) [8] Kaiser, H. F. (1960). The application of electronic computers
accounted for 76.45% of the total variance in the data set to factor analysis. Educational and Psychological
were found. The largest source of variation (26.29%) appears Measurement, 20, 141-151. Doi: 10.1177/001316446002000
to be from water quality parameters associated natural 116.
process (dissolution of rocks), point source pollution [9] Kannel PR, Lee S, Lee YS (2008) Assessment of
(industrial and municipal waste water) and non-point source spatial-temporal patterns of surface and ground water
pollution (mostly from agricultural activities). It is qualities and factors influencing management strategy of
noteworthy that PCA confirm exactly the result of CA and groundwater system in an urban river corridor of Nepal. J
Environ Manage 86:595–604
determine the pollution source.
Therefore, the result of this study clearly demonstrates the [10] Kaown D, Koh D-C, Mayer B, Lee K-K (2009) Identification
Science and Technology 2014, 4(3): 42-49 49
of nitrate and sulphate sources in groundwater using dual [18] Reghunath R, Murthy S. T. R & Raghavan B. R (2002). The
stable isotope approaches for an agricultural area with utility of multivariate statistical techniques in
different land use (Chuncheon, mid-eastern Korea). Agric hydrogeochemical studies. An example from Karnataka,
Ecosystem Environ132 (3–4):223–231. India, Water Research 36. 2437-2442.
[11] Kim J. O, Mueller C. W (1987). Factor Analysis: Statistical [19] Shamsuddeen M. K, Sefie, A, Normi, A, Tawnie, I, Suratman,
Methods and Practical Issues, Sage University Paper Series S, (2014). Impact of sea level rise to coastal groundwater at
on Quantitative Applications in the Social Sciences, series no Kuala Terengganu, Terengganu. Hydrogeology Research
07–014. Sage Publications, Beverly Hills. Centre. National Hydraulic Research institutes Malaysia. Lot
5377, Jalan Putra, 43300, Selangor Malaysia.
[12] Koh D-C, Mayer B, Lee K-S, Ko K-S (2010) Land-use
controls on sources and fate of nitrate in shallow groundwater [20] Shrestha S, Kazama F (2007) Assessment of surface water
of an agricultural area revealed by multiple environmental quality using multivariate statistical techniques: a case study
tracers. J Contam Hydrol 118:62–78. of the Fuji river basin, Japan. Environ Model Softw
22:464–475.
[13] Levins, I, Gosk, G, (2008), Trace elements in groundwater as
indicators of anthropogenic Impact. Environ Geol [21] Simeonov, V., Einax, J. W., Stanimirova, I., & Kraft, J.
55:285–290. (2002). Environmetricmodelling and interpretation of river
water monitoring data. Anal Bio anal Chem, 374,898–905.
[14] Massart, D. L., & Kaufman, L. (1983), the interpretation of
chemical data by the use of cluster analysis. New York: [22] Terceiro, P., Lobo-Ferreira, J. P., & Leitão, T. E. (2008).
Wiley. Análise da qualidade da água e questões de governân-ciana
Albufeirado Alqueva. Comunicaçãoapresen-tada no 9◦
[15] National Research Council (2000). A Review of the Draft of Congresso da Água–Água: Desafios de hoje, exigências de
the NCI-CDC Working Group to revise the “1985 Radio amanhã. Cascais, Portugal. http://www.aprh.pt/congressoag
epidemiological Tables” National Academy Press 2101. ua2008/PDF/Lobo-Ferreira Alqueva.pdf. Accessed 20
Constitution Avenue, NW, Washington DC, 20418. January 2009 (in Portuguese).
[16] Nosrati K., M Van Den Eeckhaut (2012). Assessment of [23] Vega, M., Pardo, R., Barrado, E., & Debán, L. (1998).
Groundwater Quality using Multivariate Statistical Assessment of seasonal and polluting effects on the quality of
Techniques in Hashtgerd Plain, Iran. J. Environmental Earth river water by exploratory data analysis. Water Research,
Science 65 (1) 331-344. 32(12), 3581–3592. Doi: 10.1016/S0043-1354 (98) 00138-9.
[17] Omo-Irabor OO, Olobaniyi SB, Oduyemi K, Akunna J (2008) [24] Ward, J. H. (1963). Hierarchical grouping to optimize an
Surface and groundwater water quality assessment using objective function. J Am Stat Assoc, 69, 236-244.
multivariate analytical methods: a case study of the Western
Niger Delta, Nigeria. Phys Chem Earth, Parts A/B/C [25] WHO (1997), World Health Organisation. Guidelines for
33(8–13): 666–673. drinking water quality, Geneva. Second Edition, Volume 3.