s13201-018-0684-z
s13201-018-0684-z
s13201-018-0684-z
https://doi.org/10.1007/s13201-018-0684-z
ORIGINAL ARTICLE
Abstract
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of
the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western
half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and
post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of
poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA)
and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites
distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1
having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most
meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal
DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and
post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the
three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases.
Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each
cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis
showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in
the aquifer, effect of anthropogenic activities and ion exchange processes in water.
Keywords Multivariate statistical techniques · Groundwater quality · Cluster analysis · Discriminant analysis · Principal
component analysis/factor analysis
13
Vol.:(0123456789)
43 Page 2 of 15 Applied Water Science (2018) 8:43
data on the water quality aspect (Mustapha and Aris 2012a). element/variable according to goodness and cluster analysis
In order to aid the administration to prioritize and to make (CA) chooses the identical group inside a specific data set.
informed decisions so as to improve the groundwater quality, Characterization and evaluation of surface and freshwater
it is very important to reduce the apprehensions involved in quality performed by multivariate statistical techniques has
the dataset by interpreting the spatial and temporal varia- proved to be useful in verifying spatial and temporal vari-
tions in water quality (Wang et al. 2008) and also to locate ations caused naturally and due to human induced factors
hidden pollution sources (Zhang et al. 2009). also (Helena et al. 2000; Singh et al. 2004, 2005; Hassen
In recent years, few data-driven approaches like the pro- et al. 2016).
jection pursuit technique and neural networks have been Bengaluru has suddenly overgrown its size after the Infor-
used for assessing the water quality (Salman and Ruka’h mation Technology boom. Consequent to this the city and
1999). Water Quality Index (WQI) is regarded as one of the the district administration is struggling to provide necessary
most effective way to communicate water quality (Sadat- infrastructure. The demand for water supply in particular
Noori et al. 2014). Horton (1965) suggested that various requires scientific planning and effective management of
water quality data could be aggregated into an overall index. water resources, especially the groundwater in the district
However, multivariate statistical techniques can be employed (CGWB 2012). In this study, groundwater quality data meas-
for analyzing huge water quality datasets with minimal loss ured during pre and post monsoon on 14 parameters from
of important information (Juahir et al. 2011; Samson and 67 sites distributed across the western half of the Bengal-
Elangovan 2017; Shrestha and Kazama 2007). Multivariate uru city were subjected to different multivariate statistical
statistical techniques, such as cluster analysis (CA), prin- approaches (CA, DA, PCA/FA) in order to evaluate the tem-
cipal component analysis (PCA), factor analysis (FA) and poral and spatial variations in groundwater quality caused
discriminant analysis (DA) can interpret complex data by parameters and to recognize the likely factors causing
matrices for improved understanding of water quality and variation in groundwater quality.
other environmental systems by allowing the identification
of possible factors/sources thus serving as a worthy tool for
quickly solving pollution problems (Vega et al. 1998; Lee Materials and methods
et al. 2001; Wunderlin et al. 2001; Reghunath et al. 2002;
Simeonov et al. 2003, 2004; Ravikumar and Somashekar Study area
2017). Principal component analysis (PCA) has been uti-
lized to take out the noise from huge data matrix and clas- The study area (Fig. 1) is situated in the northwestern
sify the variables into measurable components, discriminant and southwestern corner of Bengaluru city, between
analysis (DA) recognizes the most segregating measurable 12°48′24.52″ and 12°53′59.85″ North latitude and
13
Applied Water Science (2018) 8:43 Page 3 of 15 43
77°24′59.95″ to 77°30′6.72″ East Longitude and spreads (EC), nitrate (NO3−), chloride (Cl–), sulfate (SO42−) mag-
over a region of 241 km2. It gets precipitation from both nesium (Mg2+), sodium ( Na+), calcium ( Ca2+), potassium
upper east and the southwest storms with yearly aggregate (K+), iron (Fe), alkalinity (HCO3⎯), total hardness(TH) and
precipitation of around 900 mm. Bengaluru city is for the fluoride (F–). Electrical conductivity and pH were measured
most part depleted by part of the Arkavathi river catchment in the field immediately after sampling and the remaining
toward the west and South Pennar river toward the east. parameters were determined in laboratory within 24 h.
The versatility, presence and aquifer refill of groundwater
event are dominated by the measure of weathering, fracture Analysis methods
pattern, geomorphological setup and rainfall. The Banga-
lore urban district contains crystalline storm cellar, funda- The sampling, preservation, transportation and analysis of
mentally gneisses and rocks meddled by essential dykes. water samples were performed according to standard methods
These arrangements have been modified to laterite along (APHA 2005). The analytical data quality was ensured through
the eastern edge of the city. The city is intensely reliant on careful standardization, procedural blank measurements and
groundwater for its household and commercial needs. The spiked and duplicate samples. Calcium and magnesium were
appraisal of groundwater asset demonstrates that the asset determined by EDTA titrations method, sodium and potassium
is over misused. As a result of this overuse, groundwater by flame emission photometry, iron by phenanthroline spec-
quality has additionally disintegrated (DMG 2003, 2011). trophotometry, bicarbonate and carbonate by titrimetry, chlo-
ride by argnetometric titration, nitrate by UV spectrometry,
Hydrogeology sulfate by nephelometry, total dissolved solids by gravimetry,
total hardness by potentiometry and flouride by ion selection
Granites and Gneisses of peninsular gneissic group form electrode method. pH and electrical conductivity were meas-
the primary aquifers in the study region. Laterites of ter- ured in situ using digital portable water analyser (Systronics
tiary age occur as isolated patches capping crystalline rocks. –371). pH meter was calibrated by immersing the probe using
Alluvium of limited thickness and aerial extent 20–25 m two standard solutions (pH 4 and 10 buffers) while electrical
thick occur along the river courses possessing substantial conductivity meter was calibrated by immersing the probe in
groundwater potential. Groundwater occurs in phreatic standard KCl solution (0.1 N). The accuracy of the chemical
conditions or unconfined conditions in the weathered zone analysis was verified by calculating ion-balance errors using
and under semi-confined to confined conditions in fractured Aquachem, where the errors were generally around 5%.
and jointed rock formations. Groundwater movement and
recharge of aquifers are controlled by various factors like Data pretreatment
fracture pattern, degree of weathering, geo-morphological
setup and amount of rainfall received. The resistivity exami- The statistical analysis of data was carried out using SPSS
nations uncovered the presence of an exceedingly weath- software, v 20.0. The methods, such as CA and FA, require
ered rock (permeable) reaching out up to a depth of 30 m. variables to conform to a normal distribution. Normal dis-
The principal aquifer exists between 25 and 30 m depth. tribution of data is an essential requirement for multivari-
There are aquifers even past 60 m depth. The area is slop- ate statistical analyses because the analyses will be valid
ing towards west. Streams of various watersheds start from only if the standard deviations (variances) are low (very
this area. Significant piece of the study zone is possessed close to 0). Else, the parameters with the highest variances
by streams streaming towards west from this region (DMG will influence the analysis (Güler et al. 2002; Cloutier et al.
2011; CGWB 2012). 2008; Yidana et al. 2011; Boateng et al. 2016). The raw data
indicated that Ca, Na, HCO3− and SO42− were very close
Monitored parameters to normal distribution, but the distribution pattern of other
parameters was not normal. Hence these parameters were
A total of 67 groundwater samples were collected in the log transformed to make the data to have normal distribu-
month of March (2014) for pre-monsoon and November tion (Zhang et al. 2009). The standard z-scores of all the
(2014) for post-monsoon seasons. The sampling locations parameters were then used for the multivariate statistical
were selected with a view to cover residential, industrial and analysis to lessen the effects of differences in the units used
commercial areas so as to achieve a good sampling represen- for measurement and variance and to render the data dimen-
tation over the study area. The samples were collected from sionless (Singh et al. 2005; Yidana et al. 2011). The z scores
bore wells after 10 min of pumping in pre-cleaned sterilized were calculated as in Eq. (1):
plastic bottles and stored in an ice box. The samples collected
x − x̄
were analyzed for 14 physico-chemical parameters, namely z= , (1)
s
pH, total dissolved solids (TDS), electrical conductivity
13
43 Page 4 of 15 Applied Water Science (2018) 8:43
where x represents the value, x̄ represents the mean and s between any one sample and the entire data set. It is repre-
represents the standard deviation of the parameter, at a given sented by a dendrogram (tree diagram) (McKenna 2003).
sampling site. The dendrogram displays a visual summary of the clustering
processes, presenting a picture of the groups and their prox-
imity, with a reasonable lessening in dimensionality of the
Water quality index (WQI)
original data. The Euclidean distance shows the similarity
between two samples and a distance can be represented by
A WQI is a single number (like a grade) that expresses
the difference between analytical values from the samples
overall water quality at a certain location and time based on
(Forina et al. 2002; Taoufik et al. 2017).
several water quality parameters. The main purpose of WQI
Using Ward’s method on the normalized data set, hierar-
is to turn complex water quality data into information that
chical agglomerative cluster analysis was conducted in this
is understandable and usable by the public. WQI is a single
study. To measure the similarity squared euclidean distance
unit less number of 100-point scale that provides a pointer
was used. The ward’s method makes use of an analysis of
to the quality of water source (Pradhan et al. 2001; Pius et al.
variance approach for evaluating the distances between clus-
2012). According to this water quality index, the maximum
ters, in order to minimize the sum of squares (SS) of any two
permissible value is 100. Values greater than 100 indicate
clusters that can be formed at each step (Willet 1987; Adams
pollution and are unfit for human consumption. The meth-
1998; Otto 1998: Tziritis et al. 2016). Using the linkage
odology considered for development of the WQI is adopted
distance, the spatial variability of groundwater quality for
from Tiwari and Mishra (1985) as in Eq. (2):
the study area was determined from cluster analysis, which
is reported as Dlink/Dmax. Dlink/Dmax represents the quotient
[∑ ]
WQI = Anti log Wnn=1 log10 qn , (2)
between the linkage distances for a particular case divided
where weightage factor (W) is computed using Wn = K/Sn by the maximal linkage distance. To standardize the linkage
and K is proportionality constant derived from Eq. (3): distance, which is represented on the y-axis, the quotient is
then multiplied by 100 (Simeonov et al. 2003; Singh et al.
2005).
[ / n ]
∑
K= 1 1∕si , (3)
n=1 Discriminant analysis (DA)
where Sn and Si are the WHO/ICMR standard values of DA is a supervised pattern recognition technique, which is
the water quality parameter. Quality rating (q) is calcu- used for the classification of objects or cases into exhaustive
lated using qni = {[(Vactual − Videal)/(Vstandard − Vid- and mutually exclusive groups based on a set of independ-
eal)] × 100},where qni = quality rating of ith parameter ent variables. It is a suitable statistical technique when the
for a total of n water quality parameters, Vactual = value of dependent variable is a categorical variable and the inde-
the water quality parameter obtained from laboratory analy- pendent variables are metric (Mustapha and Aris 2012). The
sis, Videal = value of water quality parameter that can be purpose of DA is to increase the similarity between-group
obtained from the standard tables, Videal for pH 7 and for relative to the within-group variance. DA finds out the vari-
other parameters is equivalent to zero, Vstandard = WHO/ ables that discriminate between two or more expected occur-
ICMR standard of the water quality parameter. Based on ring groups (Johnson and Wichern 1992). It also forms a
the above WQI values, the ground water quality is rated as discriminant function (DF) for each group as in Eq. (4):
excellent, good, poor, very poor and unfit for human con-
sumption (Table 3). n
∑
f (Gi ) = ki + wij pij , (4)
j=1
Cluster analysis (CA)
where i is the number of groups (G), ki the constant inherent
CA is one of the multivariate techniques, which groups the to each group, n the number of parameters used to classify
objects based on their characteristics. It arranges the objects, a set of data into a given group, wj the weight coefficient,
such that every object is same as the others in the cluster assigned by DA to a given selected parameter (pj).
according to a predefined selection criterion. The clusters of In the present study, DA was carried out on raw data
objects obtained should then display high internal (within- using three different modes: standard, forward stepwise
cluster) resemblance and high external (between clusters) and backward stepwise to construct discriminant func-
diversity. Hierarchical agglomerative clustering is the most tions (DFs) and to assess both temporal and spatial
commonly used approach (Massart and Kaufman 1983), variations in groundwater quality. Temporal DA was
which supplies with instinctive similarity relationships done taking the monitoring period (pre-monsoon and
13
Applied Water Science (2018) 8:43 Page 5 of 15 43
post-monsoon) as the grouping variable and the 14 meas- Results and discussion
ured groundwater quality parameters as the independent
variables. Spatial DA was done in the same way as tem- Groundwater chemistry
poral DA, by taking the spatial clusters obtained in cluster
analysis as the grouping variable and the 14 measured Basic statistics of the respective values for all the phys-
water quality parameters as the independent variables. ico-chemical parameters in the pre and post-monsoon
groundwater samples from the study area and corre-
sponding permissible limits as specified by the Bureau
Principal component analysis/factor analysis of Indian Standards (2012) are presented in Table 1 and
as box plot in Fig. 2a–c. The values of pH in groundwa-
PCA is a technique, which converts the original variables ter of study area vary from 6.07 to 8.13 in pre-monsoon
into new uncorrelated variables (axes), known as prin- and 5.8 to 7.7 in post-monsoon, indicating slightly acidic
cipal components, which are linear combinations of the to alkaline nature. This shows that there is little seasonal
original variables (Sarbu and Pop 2005). The new axes fluctuation in pH values in the area that islower than the
lie in the directions where variance is maximum (Hossain permissible limit of 6.5⎯8.5. The electrical conductiv-
et al. 2015). PCA supplies the details of most significant ity of groundwater varies widely, ranging from 240 to
parameters, which describes the whole data set thereby 4230 μS/cm in pre-monsoon and 254 to 4483 μS/cm in
reducing the data with minimal loss of original informa- post-monsoon. The total dissolved solids values varied
tion (Helena et al. 2000). The principal component (PC) between 152 and 2242 mg/L in pre-monsoon and 162 and
can be expressed as in Eq. (5): 2869 mg/L in post-monsoon. The electrical conductivity
and total dissolved solids values in all the samples were
zij = ai1 x1j + ai2 x2j + ai3 x3j + ⋯ + aim xmj , (5)
well above their respective desirable limits of 1400 μS/cm
where a is the component loading, z the component score, x and 500 mg/L indicating the presence and dissolution of
the measured value of a variable, I the component number, higher salt content.
j the sample number and m the total number of variables. Water hardness is caused primarily by the presence of
PCA is continued with factor analysis. The objective cations, such as calcium and magnesium and anions, such
of factor analysis is to lessen the inputs from unimpor- as carbonate, bicarbonate, chloride and sulfates in water.
tant variables in order to further simplify the data struc- Water hardness varied between 48 and 1784 mg/L for the
ture obtained from PCA (Aris et al. 2012; Noshadi and pre-monsoon period and 50 and 1873 mg/L during post-
Ghafourian 2016). This objective can be accomplished monsoon period thereby exceeding the desirable limit of
by rotating the axis defined by PCA, according to well- 300 mg/L in many samples. Among the alkaline earths,
established rules, and generating new variables, called the concentration of calcium is in the range of 6–312 mg/L
varifactors (VF). A principal component is a linear com- in pre-monsoon and 6–316 mg/L in post-monsoon, while
bination of observable water quality variables, whereas magnesium content ranges between 8–3244 mg/L in pre-
varifactor can include unobservable, hypothetical, latent monsoon and 8–268 mg/L in post-monsoon seasons, their
variables (Vega et al. 1998; Helena et al. 2000; Qian et al. higher concentrations indicating hardness in groundwa-
2016). PCA of the normalized variables was carried out ter. Bicarbonate is the predominant anion in both pre and
to extract significant principal components and to fur- post-monsoon seasons, whose concentration varied from
ther reduce the contribution of less significant variables. 88 to 505 mg/L in pre-monsoon and 92 to 530 mg/L in
Then the extracted principal components were subjected post-monsoon. Higher concentration of bicarbonate may
to varimax rotation (raw) generating varifactors (Brumelis be attributed to leaching of mineral substances in the soil
et al. 2000; Love et al. 2004; Abdul-Wahab et al. 2005). and atmosphere during natural filtration of water from
As a result, a small number of factors will usually account sewage (Ravikumar et al. 2012).
for approximately the same amount of information as do Chlorides are in the range of 19–607 mg/L and
the much larger set of original observations. In FA, the 20–667 mg/L, respectively, during pre-monsoon and
basic concept is expressed as in Eq. (6): post-monsoon, indicating that there is not much differ-
zji = af 1 f1i + af 2 f2i + af 3 f3i + ⋯ + afm fmi + efi , (6) ence in chloride concentration between seasons. Presence
of chloride in the groundwater of the study area is due
where z is the measured value of a variable, a the factor
to seepage from sewers, septic tanks and industrial efflu-
loading, f the factor score, e the residual term accounting for
ents. The nitrate concentration in the study area ranges
errors or other sources of variation, i the sample number, j
from 2 to 252 mg/L in pre-monsoon and 2 to 262 mg/L
the variable number and m the total number of factors.
in post-monsoon seasons. Majority of the samples among
13
43 Page 6 of 15 Applied Water Science (2018) 8:43
Table 1 Basic statistics of Parameter Season Min Max Mean Skewness Kurtosis Standard IS
groundwater quality data of the 10500:2012
study region
pH Pre-monsoon 6.07 8.13 6.94 0.96 1.26 6.5–8.5
Post-monsoon 5.80 7.70 6.60 0.96 1.26
EC (μS/cm) Pre-monsoon 240 4230 1300 1.92 7.82 1400
Post-monsoon 254 4483 1378 1.92 7.82
Total Hardness (mg/L) Pre-monsoon 48 1784 444 2.55 13.08 300
Post-monsoon 50 1873 466 2.55 13.08
TDS (mg/L) Pre-monsoon 152 2422 770 1.83 7.17 500
Post-monsoon 162 2869 882 1.92 7.82
F (mg/L) Pre-monsoon 0.11 1.38 0.38 1.80 4.42 1
Post-monsoon 0.11 1.40 0.39 1.80 4.42
NO3 (mg/L) Pre-monsoon 2 252 51 2.08 7.06 45
Post-monsoon 2 262 53 2.08 7.06
Cl (mg/L) Pre-monsoon 19 607 165 1.23 2.56 250
Post-monsoon 20 667 181 1.23 2.56
SO4 (mg/L) Pre-monsoon 3 187 82 0.67 − 0.01 200
Post-monsoon 3 201 88 0.67 − 0.01
Fe (mg/L) Pre-monsoon 0.05 4.30 1.10 1.23 0.29 0.3
Post-monsoon 0.05 4.60 1.20 1.25 0.32
K (mg/L) Pre-monsoon 0.30 16 5.11 0.95 0.59 –
Post-monsoon 0.40 17 5.62 0.95 0.59
Na (mg/L) Pre-monsoon 22 205 89 0.53 − 0.09 100
Post-monsoon 23 223 97 0.53 − 0.09
Mg (mg/L) Pre-monsoon 8 244 48 3.37 18.75 30
Post-monsoon 8 268 53 3.37 18.75
Ca (mg/L) Pre-monsoon 6 312 108 0.87 3.08 75
Post-monsoon 6 336 117 0.87 3.08
HCO3 (mg/L) Pre-monsoon 88 505 325 − 0.10 0.03 –
Post-monsoon 92 530 341 − 0.10 0.03
pre-monsoon samples showed nitrate concentration above So, the “weights” for various water quality parameters are
the permissible limit of 45 mg/L, which can be attributed assumed to be inversely proportional to the recommended
to contamination from septic tank and sewage effluent as standards for the corresponding parameters (Pius et al.
there is no agricultural activity nor application of nitrog- 2012). Calculated relative weight (Wi) values of each param-
enous fertilizers as it is an urban area. Further, the fluoride eter are given in Table 2.
concentration was found to vary from 0.11 to 1.38 mg/L in Water quality types were determined on the basis of
pre-monsoon and 0.11 to 1.40 in post-monsoon, which is WQI. The computed WQI values range from 19 to 145
exceeding the desirable limit of 1 mg/L in the study area. and 24 to 164 for pre-monsoon and post-monsoon, respec-
The geology of the study area is predominated by granites/ tively. The WQI range, type of water and calculation of
gneisses with intensive presence of pegmatites, which con- WQI for percentage samples are classified in Table 3. It
tributes to the occurrence of fluoride in bore wells. can be observed that out of 67 groundwater quality data
points 24 stations (35%) fall in the “excellent” category,
Estimation of water quality index 16 stations (23%) in “good” category, 18 stations (26%) in
“poor” category, 7 stations (10%) in “very poor” category
In the present study, 12 water quality parameters, pH, TDS, and 3 stations (4%) in unfit category for pre-monsoon sea-
Hardness, F, Fe, Na, SO4, NO3, Cl, Na, Ca, Mg were con- son. During post-monsoon, 20 stations (30%) fall in the
sidered for computing WQI. It is well known that the more “excellent” category, 16 stations (23%) in “good” category,
harmful a given pollutant is, the smaller is its permissible 16 stations (23%) in “poor” category 9 stations (13%) in
value for the standard recommended for drinking water. “very poor” category and 5 stations (7%) in unfit category
13
Applied Water Science (2018) 8:43 Page 7 of 15 43
for post-monsoon dataset. The post-monsoon samples received comparatively higher pre-monsoon rainfall and
show signs of poor quality in drinking purpose compared normal monsoon rainfall while the rainfall in the post-
to pre-monsoon. Rainfall data published by Indian Mete- monsoon season was about 31% deficient, when the data
orology Department (IMD) revealed that Bangalore region were collected. Thus the dilution effect of rainfall recharge
13
43 Page 8 of 15 Applied Water Science (2018) 8:43
Fig. 2 (continued)
13
Applied Water Science (2018) 8:43 Page 9 of 15 43
Fig. 2 (continued)
Table 3 The WQI range, type of water and percentage wise water
quality index area distribution
Water quality index Description Percentage of the samples
Table 2 Water quality parameters, their standards and unit weights
Pre-monsoon Post-monsoon
Parameter Standard Weightage
0–25 Excellent 35 30
pH 6.5–8.5 0.0509
26–50 Good 23 23
Hardness 500 0.0008
51–75 Poor 26 23
TDS 500 0.0008
76–100 Very poor 10 13
F 1 0.4328
>100 Unfit for 4 7
NO3 45 0.0096 drinking
Cl 250 0.0017 (UFD)
SO4 200 0.0021
Fe 1 0.4328
K 10 0.0432
is observed to be higher in the pre-monsoon season. Also,
Na 100 0.0043
when the rainfall is deficient, there is a risk of higher con-
Mg 30 0.0144
centration of surface pollutants getting infiltrated into the
Ca 75 0.0057
groundwater.
13
43 Page 10 of 15 Applied Water Science (2018) 8:43
Spatial similarity and site grouping observed that, for both pre-monsoon and post-monsoon
data, the classification of sampling sites in cluster 1 showed
CA for pre-monsoon and post-monsoon data provided a den- higher level of pollution as compared to cluster 2. While
drogram grouping the 67 sampling sites into two statistically the parameter concentrations in cluster 2 are comparatively
important clusters (cluster 1 and cluster 2), containing 36 lower, some parameters still exceeded the desirable limits.
and 31 sites for cluster 1 and 34 and 33 sites for cluster 2, Thus cluster 1 represents high pollution sites and cluster 2
respectively, at (Dlink/Dmax) × 100 < 25 as shown in Fig. 3a, represents low pollution sites. It can be seen that the CA
b. From the cluster characteristics given in Table 4 it was technique is helpful in giving out valid classification of
Fig. 3 Dendrogram showing sampling site clusters for a pre-monsoon data and b post-monsoon data
13
Applied Water Science (2018) 8:43 Page 11 of 15 43
groundwater in the entire region. This will help in design- Table 6 Classification matrix for temporal DA
ing a future spatial sampling strategy in an optimal manner Group Percent correct Pre-mon Post-mon
reducing the number of sampling sites in the monitoring
network, which will reduce the cost without affecting the Standard mode
significance of the outcome. Pre 72.06 49 19
Post 79.41 14 54
Spatial and temporal variations in groundwater Total 75.74 63 73
quality Forward stepwise
Pre 64.71 44 24
Discriminant analysis was used in order to identify the most Post 82.35 12 56
important parameters influencing the spatial and temporal Total 73.53 56 80
variations in groundwater quality. Only 12 parameters were Backward stepwise
considered for DA excluding TDS and EC to avoid multi- Pre 67.65 46 22
collinearity. Discriminant functions (DFs) and classification Post 76.47 16 52
matrices (CMs) were derived from the standard, forward Total 72.06 62 74
stepwise and backward stepwise modes of DA. Temporal
DA was performed on raw data taking season (pre-monsoon
and post-monsoon) as the grouping variable and the meas- is given in Table 8. Standard mode DA constructed DFs
ured parameters as the independent variables. The classifi- using all 12 parameters to give 91% correct assignation of
cation functions obtained are given in Table 5 and the clas- cases in the CM. The forward stepwise mode used seven
sification matrix is given in Table 6. parameters (Mg, K, H CO3, Cl, N
O3, SO4 and F) giving 90%
Standard mode DA constructed DFs using all 12 param- correct assignation and the backward stepwise mode gave
eters to give 76% correct assignation of cases in the CM. 89% correct assignation of cases using only three parameters
The forward stepwise mode used only six parameters (K, Fe, (Mg, Cl and N O3). Thus spatial DA identified Mg, Cl and
HCO3, NO3, pH and F) giving 74% correct assignation and NO3 as the three most important parameters, which cause
the backward stepwise mode gave 72% correct assignation the discrimination between the two clusters, followed by K,
of cases using only one parameter (pH). Thus temporal DA HCO3, SO4 and F.
indicated that pH is the most important parameter, which
discriminates between the water quality in the pre-monsoon Data structure determination and source
and post-monsoon seasons, followed by K, Fe, HCO3, NO3 identification
and F.
Spatial DA was performed on raw data taking cluster (1 Principal component analysis was applied to standardized
and 2) as the grouping variable and the measured parameters datasets separately for the two clusters delineated by CA
as the independent variables. The classification functions in order to identify and compare the factors influencing the
obtained are given in Table 7 and the classification matrix high and low pollution clusters. Before carrying out PCA,
Table 5 Classification functions Variables Standard mode Forward stepwise mode Backward stepwise mode
for temporal DA
Pre-mon Post-mon Pre-mon Post-mon Pre-mon Post-mon
Ca − 0.12 − 0.06
Mg − 0.29 − 0.21
Na 0.04 0.04
K − 1.06 − 1.00 − 0.95 − 0.88
HCO3 0.04 0.03 0.05 0.05
Fe 9.20 8.97 8.83 8.62
NO3 0.23 0.22 0.24 0.23
Cl 0.04 0.04
SO4 − 0.08 − 0.07
TH 0.06 0.04
pH 69.08 65.66 65.01 61.91 46.70 44.36
F − 10.33 − 9.34 − 14.00 − 12.79
Constant − 255.42 − 233.08 − 240.80 − 220.37 − 162.70 − 146.90
13
43 Page 12 of 15 Applied Water Science (2018) 8:43
Table 7 Classification functions Variables Standard mode Forward stepwise mode Backward stepwise mode
for spatial DA
Cluster 1 Cluster 2 Cluster 1 Cluster 2 Cluster 1 Cluster 2
Ca 0.13 0.16
Mg 0.07 0.15 0.03 0.08 0.04 0.09
Na 0.07 0.06
K − 0.73 − 0.53 − 0.005 0.19
HCO3 0.03 0.03 0.03 0.03
Fe 8.13 7.94
Cl 0.03 0.04 0.00007 0.01 0.01 0.03
NO3 0.17 0.19 0.02 0.04 0.02 0.05
SO4 − 0.02 0.00 0.04 0.07
TH − 0.03 − 0.04
pH 53.58 53.40
F − 6.50 − 10.60 5.35 1.19
Constant − 198.78 − 206.95 − 9.08 − 18.94 − 2.27 − 10.23
the Kaiser–Meyer–Olkin (KMO) and Bartlett’s sphericity explaining 85 and 84% of the total variance in the respec-
tests were performed on the parameter correlation matrix in tive groundwater quality data sets. Eigenvalue is important
order to examine the validity of the PCA (Mustapha et al. in measuring the significance of the factor, i,e factors with
2012). For cluster 1, KMO value of 0.697 > 0.6 and Bart- the greater eigenvalues are considered to be most signifi-
lett’s Sphericity test significance p < 0.05 confirmed suit- cant. Eigenvalues of 1.0 or greater are considered signifi-
ability for PCA. The parameters K, N O3, Fe and pH were cant (Kim and Mueller 1987). Same number of VFs were
excluded from the analysis due to communalities < 0.5. PCA obtained for two clusters by performing FA on the PCs.
with varimax rotation was applied to the standardized data- Variable loadings, explaining variance and corresponding
sets of the remaining ten parameters. For cluster 2, KMO VFs, are presented in Table 9. (Liu et al. 2003) designated
value of 0.691 > 0.6 and Bartlett’s Sphericity test signifi- the factor loadings as ‘weak’, ‘moderate’ and ‘strong’,
cance p < 0.05 confirmed suitability for PCA. The param- with respect to the absolute loading values of 0.50–0.30,
eters K, NO3, Fe and pH were excluded from the analysis 0.75–0.50 and > 0.75, respectively.
due to communalities < 0.5. PCA with varimax rotation was For the data set pertaining to cluster 1, VF1, which
applied to the standardized datasets of the remaining ten explained 47.4% of the total variance had strong positive
parameters. loadings on Ca, Mg, TDS, EC and TH and moderate posi-
PCA of the high and low pollution cluster datasets (clus- tive loading on Na. Thus VF1 mainly accounts for calcium
ter 1 and cluster 2) yielded three PCs for both the high and magnesium salts in water resulting in high hardness.
and low pollution sites with eigenvalues greater than 1, Also it can be inferred that the high electrical conductiv-
ity and high dissolved solids’ content in the water samples
are predominantly contributed by calcium and magnesium
Table 8 Classification matrix for spatial DA and to a lesser extent by sodium. VF2 explaining 20.9% of
Group Percent correct Cluster 1 Cluster 2 the total variance had strong positive loadings on Cl, Na
and SO4. Thus VF2 indicates Na–Cl water type and also the
Standard mode
presence of sodium sulfate in groundwater. VF3 explaining
Cluster 1 98.59 70 1
17.1% of the total variance had strong positive loadings on
Cluster 2 83.08 11 54
HCO3 and F, moderate positive loading on Na and moderate
Total 91.18 81 55
negative loading on S O4. The strong positive loading on F
Forward stepwise mode
and HCO3 indicates that dissolution of fluoride occurring in
Cluster 1 97.18 69 2
groundwater is favorable in alkaline environment.
Cluster 2 81.54 12 53
For the data set representing cluster 2, among the three
Total 89.71 81 55
VFs, VF1, which explained 46.3% of the total variance had
Backward stepwise mode
strong positive loadings on Ca, Cl, TDS, EC and TH; and
Cluster 1 98.59 70 1
moderate positive loadings on Mg and HCO3. Thus VF1
Cluster 2 78.46 14 51
indicates that the presence of high hardness, electrical con-
Total 88.97 84 52
ductivity and dissolved solids in the groundwater is mainly
13
Applied Water Science (2018) 8:43 Page 13 of 15 43
Table 9 Varimax rotated factor Parameter Varimax rotated component (cluster 1) Varimax rotated component
loading on significant PCs of (cluster 2)
cluster 1 and cluster 2
VF1 VF2 VF3 VF1 VF2 VF3
due to the presence of calcium chlorides and bicarbonates • Different multivariate statistical techniques were
and to a lesser extent due to magnesium chlorides and bicar- applied to evaluate spatial and temporal variations in
bonates. VF2 explaining 23.3% of the total variance had groundwater quality of Bengaluru city. Hierarchical
strong positive loading on F, H CO3 and S O4; and moderate cluster analysis was useful in classifying the 67 sam-
positive loadings on EC, TDS and Mg. Thus PC2 indicates pling sites into two main clusters as high- and low-
higher dissolution of Flouride in alkaline environment. Also, pollution areas. This helps in the identification of prob-
the presence of sodium and magnesium sulfates is indicated. lematic zones in the area where remedial actions need
PC3 explaining 14.5% of the total variance had strong posi- to be focused. Also, grouping the areas having similar
tive loading on Na and strong negative loading on Mg. Thus groundwater condition may be used to determine the
PC3 indicates salinity due to sodium ion possibly from number of sampling sites required for regular monitor-
sodium-containing rock formations. Also the low loading ing of groundwater quality.
on HCO3 along with the negative loading on Mg may be • DA was useful in identifying a few indicator param-
due to removal of Mg from the groundwater in the form of eters responsible for significant variations (spatial and
Magnesium bicarbonate precipitate. temporal) in groundwater quality of the study area. pH
was identified as the most important parameter, which
discriminates between the groundwater quality in the
Conclusions pre-monsoon and post-monsoon seasons and accounts
for 72% seasonal assignation of cases. Mg, Cl and NO3
• The present study demonstrated the importance of were identified as the three most important parameters
multivariate statistical analysis in groundwater stud- discriminating between the two clusters and accounting
ies. Basic statistics showed that most of the param- for 89% spatial assignation of cases.
eters were found to exceed the specified desirable limits • Grouping of the measured parameters to identify the
while few parameters exceeded the permissible limits underlying factors or processes influencing the ground-
as well. The WQI calculated showed that the number water quality in the study region was achieved through
of samples rated as poor, very poor and unfit constitute PCA. Three principal components (PCs) each were
about 50% of the total samples thereby pointing out identified for the two clusters. Dissolution of hardness
to the fact that the groundwater of these needs some causing Ca and Mg from bed rock and anthropogenic
degree of treatment before consumption, and it also sources, fluoride dissolution from bedrock in alkaline
needs to be protected from the perils of contamina- environment and salinity from natural and anthropo-
tion. The results of WQI agree with the fact that many genic sources were identified to be the main factors
parameters exceeded the desirable limits as observed influencing the ground water quality in both the clus-
from basic statistical analysis. ters.
13
43 Page 14 of 15 Applied Water Science (2018) 8:43
• Thus, the usefulness of multivariate statistical tech- hydrogeochemical evolution of groundwater in a sedimentary rock
niques for analysis and interpretation of complex data aquifer system. J Hydrol 353(3):294–313
Dawoud MA, Raouf ARA (2009) Groundwater exploration and assess-
sets was illustrated in this study for groundwater quality ment in rural communities of Yobe State, Northern Nigeria. Water
assessment. The grouping information extracted from Resour Manag 23(3):581–601
cluster analysis can be used to design optimal sampling Department of Mines and Geology (2003) Status of ground water qual-
strategy, which; could reduce the number of sampling ity in Bangalore and its environs. District groundwater brochure,
Bangalore
stations and associated costs. DA provided with data Department of Mines and Geology (2011) Groundwater hydrology
reduction, by identifying the most important parame- and groundwater quality in and around Bangalore city. District
ters, which; needs to be monitored in order to study the groundwater brochure, Bangalore
spatial and temporal variations in water quality. While Forina M, Armanino C, Raggio V (2002) Clustering with dendograms
on interpretation variables. Anal Chim Acta 454:13–19
PCA served as a means to identify those parameters, Güler C, Thyne GD, McCray JE, Turner KA (2002) Evaluation of
which; have greatest contribution to temporal variation graphical and multivariate statistical methods for classification of
in the groundwater quality and suggested possible sets water chemistry data. Hydrogeol J 10(4):455–474
of pollution sources. Overall the multivariate statisti- Hassen I, Hamzaoui-Azaza F, Bouhlila R (2016) Application of multi-
variate statistical analysis and hydrochemical and isotopic inves-
cal techniques helped in understanding the temporal/ tigations for evaluation of groundwater quality and its suitability
spatial variations in groundwater quality, identification for drinking and agriculture purposes: case of Oum Ali-Thelepte
of pollution sources/factors as an effort towards a more aquifer, central Tunisia. Environ Monit Assess 188(3):1–20
effective groundwater quality management. Helena B, Pardo R, Vega M, Barrado E, Fernández JM, Fernández
L (2000) Temporal evolution of groundwater composition in an
alluvial aquifer (Pisuerga river, Spain) by principal component
analysis. Water Res 34:807–816
Open Access This article is distributed under the terms of the Crea- Horton RK (1965) An index number system for rating water quality. J
tive Commons Attribution 4.0 International License (http://creativeco Water Pollut Control Fed 37:300–305
mmons.org/licenses/by/4.0/), which permits unrestricted use, distribu- Hossain MA, Ali NM, Islam MS, Hossain HZ (2015) Spatial distribu-
tion, and reproduction in any medium, provided you give appropriate tion and source apportionment of heavy metals in soils of Gebeng
credit to the original author(s) and the source, provide a link to the industrial city, Malaysia. Environ Earth Sci 73(1):115–126
Creative Commons license, and indicate if changes were made. Jammel A, Hussain AZ (2003) Impact of sewage on the quality of Uya-
kandan channel water of River Cauvery at Tiruchirapalli. Indian J
Environ Prot 23(6):660–662
Jat MK, Garg PK, Khare D (2008) Monitoring and modelling of urban
References sprawl using remote sensing and GIS techniques. Int J Appl Earth
Obs Geoinf 10(1):26–43
Abdul-Wahab SA, Bakheit CS, Al-Alawi SM (2005) Principal compo- Jerry AN (1986) Basic environmental technology (water supply, waste
nent and multiple regression analysis in modelling of ground-level disposal and pollution control). Wiley, New York
ozone and factors affecting its concentrations. Environ Model Johnson RA, Wichern DW (1992) Applied multivariate statistical
Softw 20(10):1263–1271 analysis, 3rd edn. Prentice-Hall, New Jersey
Adams MJ (1998) The principles of multivariate data analysis. In: Juahir H, Zain SM, Yusoff MK, Hanidza TT, Armi AM, Toriman ME,
Ashurst PR, Dennis MJ (eds) Analytical methods of food authen- Mokhtar M (2011) Spatial water quality assessment of Langat
tication. Blackie Academic and Professional, London, p 350 River basin (Malaysia) using environmetric techniques. Environ
APHA (2005) Standard methods for the examination of water and Monit Assess 173(1–4):625–641
wastewater, 20th edn. American Public Health Association, Kazi TG, Arain MB, Jamali MK, Jalbani N, Afridi HI, Sarfraz RA,
Washington, DC Baig JA, Shah AQ (2009) Assessment of water quality of pol-
Aris AZ, Praveena SM, Abdullah MH, Radojevic M (2012) Statistical luted lake using multivariate statistical techniques a case study.
approaches and hydrochemical modeling of groundwater system Ecotoxicol Environ Saf 72:301–309
in a small tropical island. J Hydroinf 14:206–220 Kim JO, Mueller CW (1987) Introduction to factor analysis: what it is
Avvannavar SM, Shrihari S (2008) Evaluation of water quality index and how to do it. Quantitative applications in the social sciences
for drinking purposes for river Netravathi, Mangalore, South series
India. Environ Monit Assess 143:279–290 Kumar M, Ramanathan AL, Tripathi R, Farswan S, Kumar D, Bhat-
BIS (2003) Drinking water specifications, Bureau of Indian Standards, tacharya P (2017) A study of trace element contamination using
1991, IS:10500 (revised 2003) multivariate statistical techniques and health risk assessment
Boateng TK, Opoku F, Acquaah SO, Akoto O (2016) Groundwater in groundwater of Chhaprola Industrial Area, Gautam Buddha
quality assessment using statistical approach and water quality Nagar, Uttar Pradesh, India. Chemosphere 166:135–145
index in Ejisu-Juaben municipality, Ghana. Environ Earth Sci Lee JY, Cheon JY, Lee KK, Lee SY, Lee MH (2001) Statistical evalu-
75(6):1–14 ation of geochemical parameter distribution in a ground water
Brūmelis G, Lapiņa L, Nikodemus O, Tabors G (2000) Use of an arti- system contaminated with petroleum hydrocarbons. J Environ
ficial model of monitoring data to aid interpretation of principal Qual 30(5):1548–1563
component analysis. Environ Model Softw 15(8):755–763 Liu CW, Lin KH, Kuo YM (2003) Application of factor analysis in the
Central Ground Water Board. (2012) Ground water information book- assessment of groundwater quality in a blackfoot disease area in
let-Dakshina Kannada District Karnataka, Ministry of water Taiwan. Sci Total Environ 313(1):77–89
resources, Government of India Love D, Hallbauer D, Amos A, Hranova R (2004) Factor analysis as a
Cloutier V, Lefebvre R, Therrien R, Savard MM (2008) Multivari- tool in groundwater quality management: two southern African
ate statistical analysis of geochemical data as indicative of the case studies. Phys Chem Earth 29(15–18):1135–1143
13
Applied Water Science (2018) 8:43 Page 15 of 15 43
Massart DL, Kaufman L (1983) The interpretation of analytical data Simeonov V, Stratis JA, Samara C, Zachariadis G, Voutsa D,
by the use of cluster analysis. Wiley, New York Anthemidis A (2003) Assessment of the surface water quality in
McKenna JE (2003) An enhanced cluster analysis program with boot- northern Greece. Water Res 37(4119):4124
strap significance testing for ecological community analysis. Envi- Simeonov V, Simeonova P, Tzimou-Tsitouridou R (2004) Chemometric
ron Model Softw 18(3):205–220 quelity assessment of surface waters: two case studies. Chem Inż
Mustapha A, Aris AZ (2012) Multivariate statistical analysis and envi- Ekol 11(6):449–469
ronmental modeling of heavy metals pollution by industries. Pol Singh KP, Malik A, Mohan D, Sinha S (2004) Multivariate statistical
J Environ Stud 21:1359–1367 techniques for the evaluation of spatial and temporal variations
Mustapha A, Aris AZ, Juahir H, Ramli MF (2012) Surface water qual- in water quality of Gomti river (India): a case study. Water Res
ity contamination source apportionment and physicochemical 38:3980–3992
characterization at the upper section of the Jakara basin, Nigeria. Singh KP, Malik A, Sinha S (2005) Water quality assessment and
Arab J Geosci apportionment of pollution sources of Gomti river (India) using
Noshadi M, Ghafourian A (2016) Groundwater quality analysis using multivariate statistical techniques: a case study. Anal Chim Acta
multivariate statistical techniques (case study: Fars province, 538:355–374
Iran). Environ Monit Assess 188(7):1–13 Taoufik G, Khouni I, Ghrabi A (2017) Assessment of physico-chemical
Otto M (1998) Multivariate methods. In: Kellner R, Mermet JM, Otto and microbiological surface water quality using multivariate sta-
M, Widmer HM (eds) Analytical chemistry. Wiley, Wenheim tistical techniques: a case study of the Wadi El-Bey river, Tunisia.
Pius A, Jerome C, Sharma N (2012) Evaluation of groundwater quality Arab J Geosci 10(7):181
in and around Peenya industrial area of Bangalore, South India Tirkey P, Bhattacharya T, Chakraborty S, Baraik S (2017) Assessment
using GIS techniques. Environ Monit Assess 184(7):4067–4077 of groundwater quality and associated health risks: a case study
Pradhan SK, Patnaik D, Rout SP (2001) Water quality index for the of Ranchi city, Jharkhand, India. Groundwater for Sustainable
ground water around a phosphatic fertilizer plant. Indian J Environ Development
Prot 21(4):355–358 Tiwari TN, Mishra M (1985) A preliminary assignment of water qual-
Qian J, Wang L, Ma L, Lu Y, Zhao W, Zhang Y (2016) Multivariate ity index of major Indian rivers. Ind J Environ Prot 5(4):276–279
statistical analysis of water chemistry in evaluating groundwater Tziritis E, Skordas K, Kelepertsis A (2016) The use of hydrogeochemi-
geochemical evolution and aquifer connectivity near a large coal cal analyses and multivariate statistics for the characterization of
mine, Anhui, China. Environ Earth Sci 75(9):1–10 groundwater resources in a complex aquifer system. A case study
Ravikumar P, Somashekar RK (2012) Assessment and modelling of in Amyros river basin, Thessaly, central Greece. Environ Earth
groundwater quality data and evaluation of their corrosiveness Sci 75(4):1–11
and scaling potential using environmetric methods in Bangalore Vega M, Pardo R, Barrado E, Deban L (1998) Assessment of seasonal
south taluk, Karnataka state, India. Water Resour 39(4):446–473 and polluting effects on the quality of river water by exploratory
Ravikumar P, Somashekar RK (2017) Principal component analysis data analysis. Water Res 32:3581–3592
and hydrochemical facies characterization to evaluate ground- Wang J, Da L, Song K, Li B (2008) Temporal variations of surface
water quality in Varahi river basin, Karnataka state, India. Appl water quality in urban, suburban and rural areas during rapid
Water Sci 7(2):745–755 urbanization in Shanghai. China 152:387–393
Reghunath R, Murthy TS, Raghavan BR (2002) The utility of mul- Willet P (1987) Similarity and clustering in chemical information sys-
tivariate statistical techniques in hydrogeochemical studies: an tems. Research Studies Press, Wiley, New York
example from Karnataka, India. Water Res 36(10):2437–2442 Wunderlin DA, Diaz MP, Ame MV, Pesce SF, Hued AC, Bistoni MA
Sadat-Noori SM, Ebrahimi K, Liaghat AM (2014) Groundwater qual- (2001) Pattern recognition techniques for the evaluation of spatial
ity assessment using the water quality index and GIS in Saveh- and temporal variations in water quality, a case study: Suquia river
Nobaran aquifer, Iran. Environ Earth Sci 71(9):3827–3843 basin (Cordoba, Argentina). Water Res 35:2881–2894
Salman SR, Ruka’h YA (1999) Multivariate and principal component Yidana SM, Yiran GB, Sakyi PA, Nude PM, Banoeng-Yakubo B (2011)
statistical analysis of contamination in urban and agricultural soils Groundwater evolution in the Voltaian basin, Ghana—an appli-
from north Jordan. Environ Geol 38(3):265–270 cation of multivariate statistical analyses to hydrochemical data.
Samson S, Elangovan K (2017) Multivariate statistical analysis to Nat Sci 3(10):837
assess groundwater quality in Namakkal district, Tamil Nadu, Zhang Y, Guo F, Meng W, Wang XQ (2009) Water quality assessment
India and source identification of Daliao river basin using multivariate
Sarbu C, Pop HF (2005) Principal component analysis versus fuzzy statistical methods. Environ Monit Assess 152:105–121
principal component analysis: a case study: the quality of Danube
water (1985–1996). Talanta 65(5):1215–1220 Publisher’s Note Springer Nature remains neutral with regard to
Shrestha S, Kazama F (2007) Assessment of surface water quality jurisdictional claims in published maps and institutional affiliations.
using multivariate statistical techniques: a case study of the Fuji
river basin, Japan. Environ Model Softw 22:464–475
13