Wan 14 Waterquality
Wan 14 Waterquality
Wan 14 Waterquality
Copyright © American Society of Agronomy, Crop Science Society of America, Y. Wan and C. Conrad, South Florida Water Management District, 3301 Gun Club
and Soil Science Society of America. 5585 Guilford Rd., Madison, WI 53711 USA. Road, West Palm Beach, FL 33406; Y. Qian, K.W. Migliaccio, and Y. Li, Soil and Water
All rights reserved. No part of this periodical may be reproduced or transmitted Science Dep. and Agricultural and Biological Engineering Dep. at Tropical Research
in any form or by any means, electronic or mechanical, including photocopying, and Education Center, IFAS, Univ. of Florida, 18905 SW 280th Street, Homestead, FL
recording, or any information storage and retrieval system, without permission in 33031. Assigned to Associate Editor Ying Ouyang.
writing from the publisher.
Abbreviations: CA, cluster analysis; CPC, common principal component; DA,
J. Environ. Qual. 43:599–610 (2014) discriminant analysis; FA, factor analysis; IRL, Indian River Lagoon; PCA, principal
doi:10.2134/jeq2013.09.0355 component analysis; SLE, St. Lucie Estuary; SFWMD, South Florida Water
Received 6 Sep. 2013. Management District; STA, storm-water treatment area; TFE, total iron; TKN, total
*Corresponding author ([email protected]). Kjeldahl nitrogen; TP, total phosphorus; TSS, total suspended solids.
2002; Kaufman and Rousseeuw, 1990; Wunderlin et al., 2001). Discriminant analysis is also useful for determining temporal and
Thus, the results of CA typically exhibit significant homogeneity spatial variations caused by natural and anthropogenic factors
within clusters and heterogeneity between clusters. Cluster linked to seasonality. Santos-Roman et al. (2003) related water
analysis is a convenient way to explore water quality patterns quality groups to physical characteristics of watersheds using DA
associated mostly with spatial variability (by sampling sites) and thus predicted water quality for unmonitored watersheds.
(e.g., Singh et al., 2004; Shrestha and Kazama, 2007; Zhang Both PCA and FA are exploratory methods concerned with
et al., 2009). Some researchers have also used CA for seasonal explaining the variance–covariance structure of the data. Principal
grouping (by months) to evaluation temporal variations (e.g., component analysis is generally applied for data reduction and
Zhou et al., 2007; Yang et al., 2010). Because it is not necessary interpretation of water quality data (e.g., Petersen et al., 2001;
to know the class characteristics of the data in advance, CA is Ouyang, 2005). Factor analysis, as an extension of PCA, aims
called unsupervised pattern recognition (Fraley and Raftery, 2002; to identify the underlying but unobservable quantities, called
Kowalkowski et al., 2006; Wunderlin et al., 2001). In contrast, factors, by analyzing the covariance relationships among multiple
DA is a statistical supervised pattern recognition method used to variables. This is commonly achieved through varimax rotation
discriminate a priori known groups or clusters (e.g., data from (orthogonal), which redistributes the variance of each variable
different seasons or regions) by determining the variables with to allow a high loading on a single factor and low loadings on
significant mean differences among them (Statsoft Inc., 1984; the other factors ( Johnson and Wichern, 1992). Thus, PCA
Fraley and Raftery, 2002; Insightful Corporation, 2005). is a linear combination of observable water quality variables,
Discriminant analysis has been applied to identify the most whereas FA identifies unobservable, hypothetical, latent variables
significant water quality parameters discriminating between (Vega et al., 1998). Principal component analysis and FA are
groups, and these parameters are considered to account for the commonly applied together in water quality data analysis to
spatial or temporal variations in water quality (Singh et al., 2004; identify pollution sources, i.e., naturally occurring (weathering or
Shrestha and Kazama, 2007). Thus, DA can render considerable geological processes) or anthropogenic (agricultural, industrial,
reduction of the dimensionality of the original data matrix. or domestic origins) (e.g., Wunderlin et al., 2001; Singh et al.,
Table 1. Selected surface water quality studies using multivariate techniques published since 2001.
Multivariate Auxiliary Study area Data analysis period and Sampling Water quality
Study Study objectives
techniques† analysis‡ (Country) sampling interval sites parameters
yr ——— no. ———
Wunderlin et CA, FA, DA Suquia River 2 (1998–2000, monthly) 9 22 spatial and temporal analysis, data
al. (2001) (Argentina) reduction, source identification
Petersen et al. PCA bootstrap Elbe River 5 (1994–1997, once every 14 7 identification of processes
(2001) procedure (Germany) 14 d) affecting water quality
Simeonov et CA, PCA RM River systems 3 (1997–2000, monthly) 25 27 spatial analysis, data reduction,
al. (2003) (Greece) source identification and
Singh et al. CA, PCA/FA, Gomti River 5 (1994–1998, monthly) 8 24 spatial and temporal analysis, data
(2004) DA (India) reduction, source identification
Singh et al. CA, PCA/FA, RM Gomti River 3 (1999–2001, monthly) 8 34 spatial and temporal analysis, data
(2005) DA (India) reduction, source identification
and apportionment
Ouyang (2005) CA, FA St. Johns River 3 (1999–2001, daily or 22 42 evaluation of monitoring network,
(USA) monthly) identification of essential
Zhou et al. CA, DA rivers (Hong 5 (2000–2004, monthly) 23 23 spatial and temporal analysis, data
(2007) Kong) reduction. source identification
Shrestha and CA, PCA/FA, Fuji River (Japan) 8 (1995–2002, monthly) 13 25 spatial and temporal analysis, data
Kazama (2007) DA reduction, source identification
Pejman et al. CA, PCA/FA Haraz River (Iran) 2 (2007–2008, seasonal) 8 10 spatial and temporal analysis,
(2009) source identification
Zhang et al. CA, PCA/FA, Xiangjiang River 7 (1994– 2000) 34 12 spatial analysis, data reduction,
(2009) (China) source identification
Varol and Şen CA, PCA/FA Behrimaz Stream 1 (2003, monthly) 4 20 spatial and temporal analysis, data
(2009) (Turkey) reduction, source identification
Yang et al. CA, PCA/FA, IDW Lake Dianchi 5 (2003–2007, monthly) 8 12 spatial and temporal analysis, data
(2010) DA (China) reduction, source identification
Koklu et al. PCA/FA,DA MLR Melen River 11 (1995–2006, once 2–3 5 26 data reduction, source
(2010) (Turkey) mo) identification
Li et al. (2011) CA, PCA/FA RM, ANOVA 19 rivers (China) 2 (4 sampling trips in 2006 19 21 spatial and temporal analysis,
and 2007) source identification, source
Mustapha and PCA/FA MLR Jakara River 0.17 (July 31 to Sept. 30, 4 15 source identification
Abdu (2012) (Nigeria) 2011, daily)
† CA, cluster analysis; PCA, principal components analysis; FA, factor analysis; DA, discriminant analysis.
‡ RM, receptor modeling; MLR, multiple linear regression; IDW, inverse distance weighting; ANOVA, analysis of variance.
depth for each basin using the Thiessen Polygon method. The through secondary and tertiary canals, land use of the “buffer
period of record for the analysis was from 1981 through 2004. zone” along the primary canal was considered an insignificant
The 12 water quality parameters selected for analysis in this factor influencing spatial variation in water quality (Carey et al.,
study were: DO (mg/L), specific conductivity (mS/cm), pH, 2011), and thus the “buffer zone” concept was not examined in
turbidity (nephelometric turbidity units, NTU), color (Pt–Co this study.
units, PCU), total suspended solids (TSS, mg/L), NO3–N +
NO2–N (NOX–N, mg/L), NH4–N (mg/L), total Kjeldahl N Data Treatment and Statistical Methods
(TKN, mg/L), PO4–P (mg/L), TP (mg/L), and total Fe (TFe, Before performing statistical analyses, the data were
mg/L). Analytical methods are summarized in Table 3. logarithmically transformed to normalize the distribution of
The SFWMD developed land use/land cover GIS data layers each water quality parameter and minimize the effects of outliers.
for 1988, 1995, 1999, and 2004 by photointerpretation of aerial All statistical analyses were performed using S-Plus (Version 7.0,
photography and digital orthophotographic quarter quadrangles. Insightful Corporation, 2005).
Each layer was processed to derive the percentage of major Cluster Analysis
land use types for each basin. Land use types were aggregated
into seven categories: pasture, citrus, other agriculture, urban For each station, the mean values of the transformed data
(including transportation), wetland, forest, and water (Table 2). were normalized to minimize the effects of the scale of the units
These categories are reflective of basin-specific land and water on the clustering, so that each variable was considered equally
management practices and are consistent with the land use important (Kaufman and Rousseeuw, 1990):
types used for hydrologic simulations of the basins (Wan et al., x if - m f [1]
zif =
2006). Because flows are routed into the primary canals mainly sf
Groups of Monitoring Stations
The classification patterns obtained by both group-average
linkage and Ward’s method using Euclidean distance were
similar, and only the dendrogram obtained from the group-
average method is presented (Fig. 2). The seven monitoring
Fig. 2. Dendrogram of the cluster analysis of seven water quality
stations were clustered into four groups. Group 1 (G1) included monitoring stations on the C-23, C-24, C-25, and C-44 canals.
C23S48, C23S97, and C24S49; Group 2 (G2) included
C25S50 and C25S99; Group 3 (G3) consisted of C44S80; and explaining 75% of the total variance of the water quality data set
Group 4 (G4) consisted of C44S308. This grouping is consistent (Table 6). A high loading of a variable on a factor shows a strong
with the basin delineation and water/land management. Group relationship between the factor and the respective variable. Liu
4 (C44S308) had the greatest separation from the other stations. et al. (2003) classified the significant loadings as strong (absolute
Group 3 (C44S80) also had distinctly different features than the loading value >0.75), moderate (0.50–0.75), and weak (0.30–
other groups. These two stations control the discharge from Lake 0.50). This classification was commonly adopted in later studies
Okeechobee and C-44 basin runoff to the SLE (Fig. 1). Group (e.g., Ouyang, 2005; Singh et al., 2005). In the present study,
1 stations receive flows from the C-23 and C-24 basins, which factor loadings >0.5 were considered significant. As shown in
have similar land use and development history, with pasture as Table 6, the first factor, explaining 26.88% of the total variance,
the predominant land use (36 and 43%, respectively) (Table 2). had significant positive loadings on TP, PO4–P, NH4–N, TKN,
In contrast, G2 stations receive storm water from the C-25 basin, and color. The second factor, explaining 18.47% of the total
with citrus being by far the dominant land use (47%). variance, had positive loadings on TSS, turbidity, and NOx–N.
The third factor, explaining 16.48% of the total variance, had
Spatial Variation of Discriminating Constituents positive loadings on pH and DO and a negative loading on color.
The DA results obtained from the CPC model, the canonical The fourth factor, explaining 13.44% of the total variance, had a
discriminant function, and the classical heteroscedastic model negative loading on specific conductivity and a positive loading
provided the same result (Table 4). Dissolved O2, specific on TFe.
conductivity, pH, color, TSS, and turbidity were selected as The PCA/FA of the annual means of the water quality data as
discriminating constituents using all data or seasonal data. well as rainfall depth, flow, and land use percentage identified six
Among the nutrients, PO4–P, TP, and NH4–N were also selected factors with eigenvalues >1, explaining 76% of the total variance
as discriminating constituents using all data or seasonal subsets; of the data set (Table 7). The first factor explained 23.73% of the
NOX–N was a discriminating constituent only for the dry-season total variance, with significant positive loadings on TP, PO4–P,
data, while TKN and TFe were discriminating constituents NH4–N, TKN, color, and pasture and negative loadings on citrus
using all data and dry-season data. During the dry season, all CA and water. The second factor, explaining 13.97% of the total
groups exhibited greater spatial heterogeneity than in the wet variance, had positive loadings on TSS, turbidity, NOx–N, and
season, reflecting the influence of seasons on water quality (Qian flow. The third factor, explaining 10.94% of the total variance,
et al., 2007). had significant positive loadings on rainfall, flow, and TFe and a
All water quality constituents showed distinct spatial patterns
(Table 5). The p values derived from the Kruskal–Wallis test for Table 4. Water quality discriminating constituents identified by
water quality constituents were all <0.01, indicating significant discriminant analysis.
spatial variation in water quality among CA groups. For example, Water-quality Dry-season Wet-season
All data
G1 had the highest color, TP and PO4–P; G2 had the lowest parameters data data
DO, turbidity, and TP and the highest specific conductivity Dissolved O2 yes yes yes
and NH4–N; while G4 had the highest DO, turbidity, TSS, and Specific conductivity yes yes yes
TKN and the lowest NH4–N, specific conductivity, and color. pH yes yes yes
Both G3 and G4 had greater DO, turbidity, and TSS and lower Turbidity yes yes yes
specific conductivity and color than G1 and G2. The median Color yes yes yes
values of NOX–N were similar between G3 and G4, as well as Total suspended solids yes yes yes
between G1 and G2, although G3 and G4 exhibited higher NOX–N no yes no
medians than G1 and G2. NH4–N yes yes yes
Total Kjeldahl N yes yes no
Two-Step Principal Component Analysis and Factor Analysis PO4–P yes yes yes
The PCA/FA of the annual mean concentrations of the water Total P yes yes yes
quality constituents identified four factors with eigenvalues >1, Total Fe yes yes no
significant negative loading on specific conductivity. The fourth with specific conductivity (negative), TSS (positive), turbidity
factor, explaining 10.48% of the total variance, had significant (positive), NOx–N (positive), and TFe (positive).
positive loadings on pH, DO, urban, and other agriculture land.
The fifth factor, explaining 8.86% of the total variance, had a Discussion
significant negative loading on forest and a significant positive While spatial variations in water quality were logically linked
loading on TKN. The sixth factor, explaining 7.83% of the total to contributing sources associated with domestic or industrial
variance, had significant negative loadings on DO and wetland. origins in the literature (e.g., Simeonov et al., 2003; Zhou et
The Pearson and Spearman’s rank correlation procedures
resulted in similar correlations between water quality Table 6. Rotated loading matrix (varimax) using the annual mean
concentrations of water quality constituents. Significant factor
constituents and land use percentage, rainfall, and flow, with loadings are in bold type.
Pearson giving a stronger diagnosis in some cases, and thus only
the Pearson correlation matrix is reported (Table 8). Pasture and parameter
Factor 1 Factor 2 Factor 3 Factor 4
citrus are perhaps the most dominant land use types influencing Total P 0.92 −0.14 −0.05 0.14
water quality. For example, TP, PO4–P, TKN, and NH4–N PO4–P 0.80 −0.26 −0.20 0.25
were all significantly correlated with pastures (positive) and NH4–N 0.79 0.13 −0.16 −0.40
citrus (negative). Associated with this correlation pattern was Total Kjeldahl N 0.69 0.00 0.12 0.00
significant correlation (negative) between these nutrients and Color 0.64 −0.29 -0.59 0.16
the water land use type. Significant correlations (negative) Total suspended solids 0.00 0.86 0.04 0.02
between forest and TP, TKN, and NH4–N were detected. Urban Turbidity −0.02 0.78 0.26 0.39
land also showed significant correlations with some of these NOx–N −0.23 0.73 0.07 0.18
constituents. Color was significantly correlated with pasture pH 0.13 0.11 0.88 0.12
(positive), citrus (negative), and water (negative), while DO was Dissolved O2 −0.18 0.08 0.80 0.00
significantly correlated with pasture (negative), other agriculture Specific conductivity 0.08 −0.28 −0.26 −0.80
(positive), urban (positive), and wetland (positive). Rainfall was Total Fe 0.37 0.25 −0.34 0.72
significantly correlated with DO (negative), specific conductivity Eigenvalues 3.23 2.22 1.98 1.61
(negative), and TFe (positive). Flow was significantly correlated Variance explained, % 26.88 18.47 16.48 13.44
al., 2007; Li et al., 2011), source identification in this study was transport mechanisms such as leaching, dilution, and wash-off
not readily apparent with PCF/FA of the water quality data phenomena (Helsel and Hirsch, 1992; Ravichandrana et al.,
alone (Table 6). While the first factor in Table 6 (TP, PO4–P, 1996; Swanson et al., 2000). Our two-step PCA/FA suggested
NH4–N, TKN, and color) can be considered nutrient related, it that there existed a latent factor for TSS, turbidity, and NOx–N
is difficult to identify specific pollution sources or anthropogenic (Factor 2 in Table 6), and the factor was related to flow (Factor
activities in association with these factors. The PCA/FA of both 2 in Table 7). The most significant water resources management
water quality data and land use and hydrometric data (Table 7) practice affecting flow is flood control releases of freshwater from
revealed clear associations of specific water quality parameters Lake Okeechobee to the SLE via C-44 (Fig. 1). The separation of
with rainfall depth, flow, and land use, allowing identification of C44S308 and C44S80 from the remaining stations in the cluster
basin-specific water and land management practices that cause analysis (Fig. 2) reflects the influence of lake releases. A further
spatial variations in water quality. distinction between C44S308 and C44S80 shown in Fig. 2
was probably related to the dry-season water supply releases
Water Resources Management from Lake Okeechobee via C44S308, which were not readily
Spatial patterns of water quality have been linked to hydrologic captured by water quality samples collected at C44S80. High
processes and water resources management through pollutant flows at C44S80 and C44S308 associated with lake releases were
Table 8. Pearson correlation matrix between water quality constituents and rainfall, flow, and land use.
Water-quality parameter Rainfall Flow Pasture Citrus Urban Wetland Forest Water
Dissolved O2 −0.31** 0.07 −0.32** 0.06 0.26 0.30** 0.29** 0.05 −0.05
Specific conductivity −0.35** −0.48** 0.44** −0.19 −0.34** −0.32** −0.04 −0.07 −0.14
pH −0.19 0.11 −0.30** 0.17 0.69** 0.57** −0.03 −0.26 −0.20
Turbidity 0.04 0.78 −0.32** 0.01 0.30** 0.57** −0.05 0.30** −0.04
Color 0.22 −0.16 0.76** −0.54** −0.21 −0.22 0.17 −0.17 −0.34**
Total suspended solids 0.06 0.37** −0.24 0.11 0.16 0.30** −0.11 0.10 0.07
NOx −0.02 0.46** −0.37** −0.01 0.11 0.36** 0.15 0.38** 0.13
NH4–N −0.13 −0.18 0.54** −0.30** 0.03 −0.06 0.12 −0.33** −0.49**
Total Kjeldahl N 0.02 0.09 0.43** −0.27** 0.18 0.07 0.23 −0.39** −0.49**
PO4–P −0.01 −0.15 0.67** −0.55** 0.10 0.05 0.16 −0.21 −0.40**
Total P −0.05 −0.13 0.63** −0.48** 0.22 0.09 0.14 −0.32** −0.48**
Total Fe 0.41** 0.50** 0.13 −0.14 0.14 0.16 −0.08 0.00 −0.13
** Significant at a = 0.01.
606 Journal of Environmental Quality
probably the most significant factor contributing to their higher correlation of specific conductivity with rainfall (Table 8) also
DO, turbidity, TSS, NOx–N, and lower specific conductivity, suggests that lower specific conductivity in high-rainfall years
color, and NH4–N than at G1 and G2 stations (Table 5). is associated with less supplemental irrigation supplies from the
Specifically, the higher DO, turbidity, and TSS were probably Floridan Aquifer.
related to enhanced aeration of water and erosion of sediment
under high-flow conditions. Figure 3 shows that increases in Land Use and Storm-Water Retention
TSS and turbidity with flows at C44S80 can be well defined by Land use and development have long been shown to
linear relationships at monthly or annual time scales. The lower influence surface water quality, with Osborne and Wiley (1988)
color, lower NH4–N, and higher NOx–N were possibly because linking urbanization to water quality changes in the St. Fork
Lake Okeechobee water has a longer residence time than storm- River in Illinois, Long and Plummer (2004) attributing high
water runoff originating from the local basins, thereby allowing levels of specific conductance to dense residential development
photolysis and oxidation of these constituents in lake water in the Wachusett Reservoir watershed in Massachusetts, and
(Doering and Chamberlain, 1999). Yin et al. (2005) relating urban development patterns to shifts
Another latent factor identified by PCA/FA was specific in pollution sources in Shanghai, China. Water quality has also
conductivity and TFe (Factor 4 in Table 6), and this factor been evaluated with respect to the amount of arable land and
was linked to rainfall and flow (Factor 3 in Table 7). Total Fe is grassland (Ferrier et al., 2001), urban land and upland agriculture
mostly in particulate form, and positive correlation with rainfall (Santos-Roman et al., 2003; Zampella et al., 2007), and land
and flow is probably transport related. Correlation of specific development intensity and impervious percentage (Carey et al.,
conductivity with flow and rainfall depth is, however, due to the 2011). The two-step PCA/FA indicated that the nutrient-related
irrigation practice on citrus lands. Since large-scale expansion factor (TP, PO4–P, NH4–N, TKN, and color) was associated
of citrus in the 1960s, storage capacity in the drainage network with land management of pasture and citrus and the area of open
has been inadequate to meet irrigation demands. In the C-23, water in a basin (Factor 1 in Tables 6 and 7). The C-23 and C-24
C-24, and C-25 basins, a common practice is to use artesian basins had the greatest percentage of pasture (Table 2). Mean
well water from the Floridan Aquifer, a confined aquifer with concentrations of TP, PO4–P, NH4–N, and color in G1 were
high mineral content, as an irrigation supplement when surface the highest (Table 5). Graves et al. (2004) reported that color in
water becomes limited. In contrast, supplemental irrigation runoff from pasture was significantly higher than from citrus and
water is supplied by Lake Okeechobee in the C-44 basin (South urban land, and they attributed the high color to the leaching
Florida Water Management District, 2004). The different of organic materials (humic and tannic acids from vegetative
supplemental irrigation sources lead to the spatial variation in decay) into surface water. Cattle wastes in pastures are subject
specific conductivity across these basins. The higher value at G2 to storm-water runoff, and that is probably the reason why TP,
than at G1 corresponds with the greater citrus area in the C-25 PO4–P, TKN, and NH4–N were positively related to pasture.
basin than in the C-23 or C-24 basins (Table 2). The positive In addition, the C-23 and C-24 basins receive applications of
Fig. 3. Linear regressions of total suspended solids (TSS) and turbidity (in nephelometric turbidity units, NTU) with flow at C44S80 on (A,B) a
monthly time scale and (C,D) an annual time scale. All regressions are significant at a = 0.01.