Avaliação Da Qualidade Da Água para Consumo Humano
Assessment of Water Quality for Human Consumption
Received October 18, 2001; accepted June 24, 2002; published online November 11, 2002
Abstract. This study deals with the application of comply. There are different directives of many Euro-
chemometric approaches (cluster analysis and princi- pean institutions and authorities related to the pot
pal components analysis) to a potable water monitoring water quality, e.g. directives [1–5], just to mention
demonstrated on a data set from the region of Kavala, some of the well-established common acts of EU. It
Greece, being analysed according to the standard is continuously stated that any disparity between pro-
instructions and directives of the European Union. It visions already applicable or in the process of being
is shown that the data classification by cluster analysis drawn up in the various European countries, including
and data structure modeling by principal components those from East Europe, related to the drinking water
analysis reveals similar results, namely four different quality may create differences in the conditions of
patterns of water source sites are identified depending competition and, as a result, directly affect the opera-
on the geographical site location (near to Nestos river, tion of the whole European Union. Therefore, pre-
near to Strimon river, elevated sites and near-to-coast scriptions, regulations, standard procedures and laws
sites). Three latent factors, explaining over 85% of the in this sphere should be implemented as provided by
total variance, are responsible for the data structure as the principles of the Union.
follows: ‘‘water acidity (anthropogenic)’’, ‘‘water hard- Since the approximation of laws is a task for a long
ness (natural)’’ and the ‘‘marine factor’’. Their impor- time period, careful studies on the quality of water
tance for the different sites is related to the site intended for human consumption could significantly
location. Finally, it is recommended to involve the contribute to the creation of a common strategy for
environmetric data treatment as a substantial standard drinking water assessment. The programs and actions
procedure in assessment of the quality of water of the European countries provide solutions for both
intended for human consumption. setting of standards to apply to toxic chemical sub-
stances and to bacteria presenting a health hazard
Key words: Potable water quality; cluster analysis; principal com-
ponent analysis. which are present in water intended for human
consumption and the definition of physical, chemical
In view of the importance for public health of water and biological parameters corresponding to the differ-
for human consumption, it is necessary to lay down ent uses of water. It seems that all possible aspects of
various quality standards with which such water must water quality intended for human consumption are
considered and the respective rules are laid down. For recommended monitoring procedures in the assess-
instance, the values for certain parameters of the pota- ment of the quality of water intended for human con-
ble water are fixed and they must be equal to or lower sumption in the region of the Kavala municipality,
than a maximum admissible concentration; in case of Greece.
softened water intended for human consumption, the
values fixed for a number of parameters must be again Experimental
equal to or lower than a required minimum concentra-
tion; the preparation of the potable water may involve Eleven sampling sites in Kavala region, Greece, were monitored for
the quality of water for human consumption. The location of the
the use of certain substances and there are strict rules sampling sites for the potable water (1 – Chrisoupoli, 2 Krinides, 3
to govern the use and to avoid harmful effects on Lydia, 4 Filippi, 5 Polistilo, 6 Kokinohoma, 7 Elefteroupoli, 8
public health due to excessive quantities of such sub- Hortokopi, 9 Akrovouni, 10 Exohi and 11 Peramos) is presented
in Fig. 1. As seen, they are in the vicinity of City of Kavala and
stances; regular monitoring is required of the water between the rivers Strimon and Nestos. The water sources for
sources and the potable water. human consumption are either from springs or from wells.
According to some of the articles of the various Eleven quality parameters of the drinking water were monitored
within a period of three years (1998–2000) as follows: conductivity,
directives and agreements the countries may make pH, total alkalinity, total hardness, chloride, bicarbonate, nitrate,
provision for derogation from the standard values in calcium, magnesium, potassium and sodium. The frequency of anal-
order to take account for situations arising from nature ysis was monthly check of the parameters. The analytical methods
involved were standard procedures as recommended by the Eur-
and structure of the ground in the area from which the
opean directive 80=778 [1]. Potentiometric methods were applied
water supply emanates, situations arising from excep- for conductivity and pH measurements; atomic absorption spectro-
tional meteorological conditions, some local problems metry was used for determination of calcium, magnesium, potas-
or ‘‘hot spot’’ emissions. In case of emergencies, the sium and sodium; chloride was measured by UV-spectroscopy;
titrimetry was applied for total alkalinity and bicarbonate monitor-
competent national authorities may allow for limited ing and complexometry – for total hardness determination; finally,
time period changes in the admissible values. absorption spectrophotometry was the analytical method for nitrate
If a regular monitoring procedure for the drinking analysis. For the chloride measurement by UV spectroscopy a spec-
trophotometer Hitatchi UV-Vis U-2000 double beam is used along
water sources is applied, large data sets could be col- with Merck chemicals (Spectroquant). According to the method, for
lected. The data sets contain rich information about Cl concentrations 0.5–90 mg L 1 measurements are taken at
the behavior of the drinking water supplies and the 450 nm. The calibration curve is created with standards made from
potable water. The classification, modeling and inter- NaCl that was dried beforehand for two hours at 150 C. The cell
used for the measurement is 1 cm. A volume of 10 ml is taken for all
pretation of the monitoring data could be a very the standards and samples. Iron(III) nitrate and mercury(II) thiocya-
important step in the complete assessment of the qual- nate are then added to the sample. Thiocyanate is released, which,
ity of the water intended for human consumption. By in 0.1 N nitric solution, reacts with iron(III) nitrate to form
orange-red iron(III) thiocyanate. By measuring the intensity of this
the use of multivariate statistical approaches like clus- orange-red color at 450 nm, Cl concentrations can be determined
ter analysis (CA) or principal components analysis with the help of the calibration curve. The conditions for sampling,
(PCA) one is able to derive hidden information from sample preparation, the calibration, the limits of detection and the
procedure uncertainty are described in details elsewhere [1]. The
the data set about the possible influences of the envir- chemical parameters were measured in mg L 1, conductivity – in
onment (anthropogenic or naturally occurring) on the mS cm 1. The analytical measurements were performed in the
water quality. Thus, all sudden situations could find Laboratory of Instrumental Analysis, Department of Natural
Sciences, Technical Educational Institution (TEI) of Kavala. The
their explanation, ‘‘hot spot’’ emitters could be iden-
data sets are quite large to be completely published and are available
tified and estimated, various water sources could be on request from the authors.
carefully compared. Then the information obtained by The data quality was checked by parallel analytical determina-
the intelligent data analysis (IDA) will be a very use- tions in several laboratories (in Kavala, Thessaloniki and Sofia) and
the results have indicated that the uncertainties in the analysis do not
ful addition to the required concentration values com- exceed 1–3% as relative standard deviation from the mean value for,
pared to admissible thresholds. Since the water at least, 25 parallel samples.
sources for human consumption in one and the same Two chemometric approaches were used in order to classify,
model and interpret the data for the water quality. The first one is
region could be fairly different with respect to chemi- cluster analysis (CA) [6], a typical classification procedure, which
cal content or physicochemical parameters, the IDA makes it possible to detect similarities or dissimilarities within a
strategy makes it possible to introduce additional large group of objects, characterized by a certain number of vari-
important tools in an overall water quality estimation. ables. In the first step of the classification the input data matrix (n
objects m variables) is normalized to dimensionless values (by the
It is the aim of the present study to offer a simple use of z-transformation) in order to avoid classification problems
multivariate statistical strategy as addition to the with objects described by variables of completely different size (e.g.
Clustering of Sites
The cluster analysis (z-transformed input data, squared
Euclidean distance as similarity measure, Ward’s
method of linkage) for all 264 cases of observations
(11 sampling sites, 11 chemical and physicochemical
parameters, 2 years of monitoring, monthly frequency
of analytical determination) has offered the formation
of four clusters containing respectively (Fig. 2):
– Cluster 1 (dominantly cases from sites 2–8);
– Cluster 2 (dominantly cases from site 1);
– Cluster 3 (dominantly cases from sites 9 and 10);
– Cluster 4 (dominantly cases from site 11).
It is interesting to note that clusters 1, 2 and 3 are
very close to each other, since cluster 4 is located at a
relatively large distance from the other three. The
quality of the water intended for human consumption
from site 11 (Peramos) is quite different as compared
to that of the other sites. Indeed, the location of Per-
amos (very close to the coast) requires water delivery
from sources containing higher concentrations of
potassium, sodium and chloride (marine influence)
and, hence, higher conductivity. The water purifica-
tion for direct human consumption should be co-
ordinated accordingly to the specific location of the
Similar considerations hold true for site 1
Fig. 1. Site location in the region of Kavala, Greece (Chrisoupoli), which is the only site located in the
vicinity of Nestos River. In this particular region the
concentrations varying from ppm to per cent values). Then, a simi- water sources are of different origin (lower concentra-
larity measure is applied to calculate the distance between all tions of potassium, sodium, chloride and conductivity
objects of interest. Usually, the Euclidean distance is a reliable mea-
sure of similarity between the classified objects. Finally, an appro- but significantly higher levels of total hardness, total
priate linkage algorithm (single, average, centroid linkage, Ward’s alkalinity, bicarbonate, calcium and magnesium).
linkage etc.) is found to link into a group (cluster) of objects with The close similarity between sites 2 to 8 is also
similar distance and to separate those located at large distances. The
interpretation of the clusters is an important task and allows a
predetermined by their geographical (and geological)
reliable explanation of reasons leading to one or another type of locations. These are sites in the vicinity of Strimon
clustering. In the same way a clustering of the variables is possible River with respective chemical characteristics, which
with follow-up interpretation of the reasons leading to variable are slightly different from those of the Nestos river
linkage. In the environmental chemistry CA is applied to classify
multivariable ecological objects, to detect similarities in site loca- wells and springs. The water quality parameters for
tions, to pinpoint differences in emitting sources etc. [7–15]. this group of sites are significantly closer to the
parameters of site 1 but they vary from the parameters alkalinity, total hardness, chloride, bicarbonate,
of site 11. nitrate, calcium, magnesium, potassium and sodium;
Finally, one could interprete the last cluster of sites pH was not included in the statistical analysis as the
(9 and 10) as another group of slightly different data monitored are very similar):
parameters of the water quality intended for human
– Cluster 1: (conductivity, chloride, potassium,
consumption. The sites Akrovouli and Exohi are to
some extent higher elevated sites and the water origi-
– Cluster 2: (total alkalinity, bicarbonate, total hard-
nated from mountainous wells with higher pH values
ness, calcium, magnesium);
and elevated calcium and magnesium concentrations.
– Cluster 3: (nitrate).
Considering the results from the cluster analysis, it
may be postulated that in the region of Kavala four It may be assumed that three main factors may
principal water sources could be identified, condition- influence the water quality of the region – the first
ally named ‘‘coastal’’ (site 11), ‘‘Nestos’’ (site 1), one includes parameters of marine origin, the second –
‘‘Strimon’’ (sites 2 to 8) and ‘‘elevated’’ (sites 9 and naturally occurring parameters determining the water
10). This result is an important additional information alkalinity and hardness and the third – probably
to the monitoring results, which allows taking respon- reflecting anthropogenic influences related to en-
sible decisions about the local policy on water man- hanced nitrate concentration. In order to check the
agement and purification. The separation between validity of this hypothesis, one has to perform addi-
different in their chemical composition sites could tionally PCA for identifying the latent factors in a
be easily done by simple multivariate clustering more quantitative manner.
instead of sophisticated monitoring data checking
for each parameter and each site. Besides, this result
Principal Components Analysis
is an important prerequisite in determination of the
data structure or influence of possible natural or The aim of the PCA was to detect the hidden factors
anthropogenic sources on the water quality. responsible for the data structure when the whole data
set is considered (all sites, all parameters or 264 cases
altogether) and when each site is treated separately
Clustering of Chemical Parameters
(24 cases for each one of the 11 sites). In this way a
CA allows finding similarity not only in cases comparison of the factors’ role could be made on a
(separate observations of the water quality) but also large scale (all sites together) and on a local scale
in the chemical parameters determined during the (separate sites). We are aware of the fact that
monitoring. This type of clustering leads to the performing PCA on a small number of cases (only
formation of the following groups of similarity 24 for the separate sites with a significant number of
between the eleven characteristics measured (again, parameters) is not the most favorable way to interpret
264 single cases were treated, this time not with the latent factors. But, on the other hand, it is the only
respect to the 11 sampling sites but with respect to way to compare various possible sources influencing
the ten parameters measured: conductivity, total the water quality at all sites.
Table 1. Factor loadings for the chemical variables for the whole In the PCA modeling of the single sites (Table 2)
region of Kavala (all 264 cases from all sites); the underlined
loadings are statistically significant according to the Malinowski’s three different site patterns could be detected with re-
test spect to the content of the principal components for
Parameters PC 1 PC 2 PC 3 each site.
Site 1 (Chrisoupoli) possesses a pattern with a dom-
Conductivity 0.90 0.38 0.16
Total alkalinity 0.10 0.92 0.17
inating anthropogenic factor (PC 1, explained var-
Total hardness 0.07 0.65 0.71 iance 43.12%), followed by the effect of a natural
Chloride 0.99 0.01 0.03 factor (PC 2, explained variance 30.61%) and by a
Bicarbonate 0.10 0.94 0.09
marine factor (PC 3, explained variance 10.12%).
Nitrate 0.21 0.51 0.31
Calcium 0.01 0.85 0.18 This latent factor distribution seems typical for a
Magnesium 0.13 0.21 0.94 water source site located to a large settlement, far
Potassium 0.98 0.13 0.01 from the sea and near to a river spring.
Sodium 0.99 0.05 0.02
Explained variance 37.9% 33.3% 15.5% For sites 2 to 8 another type of latent factor identi-
fication is shown. For all of the sites included into this
group, the most significant latent factor is the water
In Table 1 the factor loadings for three principal hardness (or natural factor, PC 1, explained variance
components (PC or latent factors) are presented in between 38.16–44.29%), followed by the anthropo-
the case of treatment of the whole data set (264 cases). genic (water acidity, PC 2, explained variance between
As seen, these PCs explain over 85% of the total 30.06 and 33.76%) and by the marine (PC 3, explained
variance of the system in consideration and could be variance between 9.87 and 10.56%) factor. This situa-
conditionally named PC1 –‘‘marine’’ factor (with sig- tion reveals the specific location of the sites on an area
nificant contribution of the loadings of conductivity, near to Strimon river and, again, far from the coastal
chloride, potassium and sodium); PC2 – ‘‘water acid- line of the Gulf.
ity’’ factor (with significant contribution of total alka- The couple of the sites 9 and 10 show the third pat-
linity, bicarbonate and calcium) and PC3 – ‘‘water tern among the sites of Kavala region – the highest ex-
hardness’’ factor (with significant contribution of total plained variance is on the account of the natural (water
hardness and magnesium). A more specific case is the hardness PC with nearly 37%) factor, then the marine
nitrate parameter, that belongs stronger to the acidity influence (PC 2 with about 30% of explained variance)
factor although its loading is less than the value of and, finally, the anthropogenic (water acidity PC with
statistically accepted significance of 0.7. Since it has explained variance close to 10%) factor. In this case the
a loading value of 0.51 in PC2, it is logical to relate it elevated local terrain is probably responsible for the
to the ‘‘water acidity’’ factor, becoming in this way an new pattern of water quality for human consumption.
anthropogenically influenced factor. The last fourth pattern is represented by site 11
This model of the data structure corresponds to the (Peramos) where the marine factor is PC 1 with
classification results obtained by cluster analysis, 44.45% of explained variance, the water hardness
where one distinguishes between marine, natural and factor is the second principal component (explained
anthropogenic effects. variance 33.11%) and the anthropogenic influence is
Table 2. Distribution of the correlated parameters in each latent factor (principal component) for each site. TH Total hardness; TA total
acidity; cond conductivity; expl. var. – explained variance
Site PC 1 PC 2 PC 3 Expl. Var.
1 TA; HCO3 ; Ca; NO3 Mg, TH Na, K, Cl , cond 83.8%
2 TH, Ca, Mg TA; HCO3 ; NO3 Na, K, Cl , cond 79.1%
3 TH, Ca, Mg TA; HCO3 ; NO3 Na, K, Cl , cond 76.9%
4 TH, Ca, Mg TA; HCO3 ; NO3 Na, K, Cl , cond 80.2%
5 TH, Ca, Mg TA; HCO3 ; NO3 Na, K, Cl , cond 75.7%
6 TH, Ca, Mg TA; HCO3 ; NO3 Na, K, Cl , cond 82.1%
7 TH, Ca, Mg TA; HCO3 ; NO3 Na, K, Cl , cond 73.9%
8 TH, Ca, Mg TA; HCO3 ; NO3 Na, K, Cl , cond 84.2%
9 TH, Ca, Mg Na, K, Cl , cond TA; HCO3 ; NO3 77.7%
10 TH, Ca, Mg Na, K, Cl , cond TA; HCO3 ; NO3 75.8%
11 Na, K, Cl , cond TH, Ca, Mg TA; HCO3 ; NO3 87.4%
demonstrated by PC 3 (water acidity with 9.87% of – estimate the contribution of each latent factor in the
explained variance). It may be assumed that the loca- chemical and biological composition of the potable
tion of Peramos close to the sea creates a specific water.
distribution of the sources for water intended for
The parallel performance of water monitoring,
human consumption.
comparison with admissible levels and chemometric
This PCA modeling scheme corresponds to the
monitoring data analysis will offer the real assessment
classification obtained by cluster analysis.
of the water quality intended for human consumption
in a certain region.
The assessment study carried out has indicated that
