Scarbrough Et Al. 2023
Scarbrough Et Al. 2023
Scarbrough Et Al. 2023
Position Paper
Keywords: Green infrastructure (GI) is cost-effective for managing urban runoff. However, inspection and maintenance of
Machine learning GI are an increasingly common burden for stormwater managers. For instance, bioretention cells, a popular
Sensor data type of GI, may clog, sometimes unexpectedly, and detection can be challenging due to their dispersed place-
Time series
ment. Current inspection programs nationwide largely rely on time-intensive, manual, qualitative inspections.
Green infrastructure
This study develops an approach for real-time monitoring and prediction of column performance. First, we
Bioretention columns
Real-time monitoring
conduct laboratory experiments to continuously collect soil moisture data using sensors at two different depths
in bioretention column testbeds. Four design configurations are used that allow the water to drain differently
through the column, hence acting as different environmental climates. Next, we develop machine learning
models, i.e., long short-term memory (LSTM) models, to accurately predict current and future soil moisture
levels. Our results suggest that the quality of predictions is overall high, but they vary across the configurations.
1. Introduction variable vary from site to site. Thus, there are ongoing maintenance
and inspection needs for these systems to ensure their functionality.
Across the United States, urban stormwater causes surface water GI inspection maintenance has increasingly received attention as a
degradation leading watershed managers to turn to Green Infrastruc- major component of many stormwater management programs. Current
ture (GI) as a way to clean runoff and bring more natural hydrologic strategies for these programs heavily rely on manual inspection and
regimes to streams. GI improves existing or builds new infrastructure, maintenance with largely qualitative approaches. This entails perform-
adding resiliency to aged systems and helping manage extreme weather ing hands-on inspections on each GI installation, one by one, across
caused by climate change. Research has shown that besides climate the entire city (Benedict et al., 2002; Angelstam et al., 2017). The
change benefits, GI provides many health benefits for residents by im- distributed approach to GI implementation, noted above, complicates
proving not only stormwater quality and flood defense but also wildlife this task and increases the financial burden (Ryan, 2019). As an ex-
habitat, urban heat island mitigation, and public green space. (Kalu- ample, Metro Water Services in Nashville, TN, has multiple staff solely
arachchi, 2020). GI is considered a distributed management technique, dedicated to this task. Any changes in a given practice function between
whereby interventions are placed throughout the watershed to provide these inspections are difficult to anticipate or even identify.
local scale improvements. Recently, interest has grown for real-time environmental monitoring
One such GI approach that has been increasingly utilized over in urban watersheds. Such activities may better quantify the perfor-
the past decade is bioretention. There are many design variations of mance of GI and allow more quantitative assessments as to when
bioretention that are used. Bioretention has been shown to reduce maintenance is needed. Further, by minimizing hands-on labor required
the quantity and improve the quality of urban runoff by utilizing by staff and saving maintenance time within the system, the stormwater
natural processes. Despite these benefits, bioretention cells are prone management program can be more efficient and effective.
to clogging from anthropogenic factors such as debris accumulation As an example, soil moisture data can be used to understand the
within the contributing watershed, plant die-off, and seasonal weather timing and pattern of runoff movement through the soil profile. Track-
changes (Benedict et al., 2002). Additionally, these environmental ing changes in these patterns over time can aid in understanding how
∗ Corresponding author.
E-mail address: [email protected] (A. Khojandi).
https://doi.org/10.1016/j.envsoft.2023.105638
Received 4 October 2021; Received in revised form 13 January 2023; Accepted 22 January 2023
Available online 1 February 2023
1364-8152/© 2023 Elsevier Ltd. All rights reserved.
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
system function is shifting. Changes in soil moisture patterns that data. These objectives will build knowledge on using sensors to inform
include slower time to peak and a recession with a flatter slope may GI maintenance and pave the way for future research.
indicate that clogging of the system is causing runoff to infiltrate more In this study, we develop time series models using Long Short-
slowly, and thus the need for maintenance. These observations can be Term Memory (LSTM) networks (Hochreiter and Schmidhuber, 1997;
relayed back to inspectors to trigger additional actions. Wojciech Zaremba and Vinyals, 2014). In the literature, various types
Potentially complicating this approach is the wide array of GI of time series modeling exist. This includes traditional statistical models
designs utilized in practice. For bioretention alone, design variations such as Autoregressive Integrated Moving Average (ARIMA), or more
include: (1) traditional, which are allowed to freely drain at the bottom advanced, NN-based models such as LSTM networks. ARIMA is simpler,
of the soil profile, (2) internal water storage, where water ponds in and generally works well when there is a clear trend in the time series.
the bottom of the cell before filling to the level of the drain if a However, in many complex cases, LSTM has shown to outperform
given storm is large enough, and (3) real-time control, a new design ARIMA — this has been shown in forecasting wind speed (Shivani et al.,
approach that allows drainage to be instigated or stopped based on a 2019), web traffic (Shelatkar et al., 2020), and financial data (Siami-
set of control rules. Real-time control has recently become of interest Namini et al., 2018), among others. In this study, we first perform
due to the adaptability that it affords (Kerkez et al., 2016). Studies some preliminary analysis to compare the performance of ARIMA and
such as Persaud et al. (2019) have shown that real-time control can LSTM in our use case. Based on these results, LSTM outperforms
instigate favorable conditions for biogeochemical processes to optimize ARIMA, partly because our data trends are rather complex as the data
runoff treatment. However, these systems may not have soil moisture are collected from sensors capturing the soil moisture in response to
processes as regimented as standard designs, as opening and closing of precipitation at different depths. Hence, we perform our main analysis
the valve can influence infiltration processes in addition to the onset of using LSTMs.
rainfall/runoff. To showcase the model performance, we exploit a column study.
Technological advancements now allow access to low-cost sensors Notably, these columns represent a simplification of field conditions,
which can be installed within each GI structure allowing for real-time based upon more complex hydrologic processes that may vary spatially
monitoring. Further, these real-time monitoring systems are increas- across a bioretention. However, even if patterns are slightly different
ingly becoming user-friendly tools, paving the way for use in real-world in a field installation, the methods herein are robust enough to be
applications (Glasgow et al., 2004; Feuer, 1995). A study by Jim Gao trained on those data. Further, this work represents a proof-of-concept
(2014) showed that RTRM enables many opportunities for system im- to see how well soil moisture patterns can be modeled and predicted at
provements. In this study, neural networks (NN) learn the relationships an individual location within a bioretention (which is relatively well
from the actual operations data to model its function. The model is represented by the column). This is highly valuable as it allows an
tested and validated using Google’s data center. The results suggest that assessment as to how patterns change over time due to perturbations
machine learning, specifically NN, can effectively and efficiently use such as clogging of soil media.
existing sensor data to model data centers performance (Gao, 2014).
Other studies test methods, such as Boolean logic-based multi-criteria 2. Data
analysis, and suggest they are not as accurate in predicting future GI
states. They study artificial neural network (ANN) and network-based To carefully control test conditions, data collection for this case
fuzzy inference system (ANFIS) to determine that ANN is 72 percent study took place inside a greenhouse. Bioretention cell function is mim-
accurate where ANFIS is only 65 percent accurate (Labib, 2019). icked using PVC columns constructed with materials per typical design
Internet of Things (IOT) has allowed for many water industry ad- guidance. Specifically, all columns use a 30 centimeter (cm) diameter
vancements. Intelligent Water is one of these advancements that is column with a drain at the bottom. Biological fouling is a concern when
already operational in multiple cities. South Bend, Indiana, and Ann building the columns, so drains are frequently utilized and monitored.
Column construction and operation included sanding the column’s
Arbor, Michigan, are examples within the United States of where
interior walls to minimize preferential flow. Lastly, columns are filled
such systems are implemented and show promise of improving the
by layers of #57 stone, pea gravel, sand, bioretention media, mulch,
drainage system’s function (Atzori et al., 2010). Globally, there have
and a ponding zone. Each column is planted in a climate-controlled
been other countries that have shown promise of implementation of
greenhouse of 15–27 ◦ C.
improved drainage systems as well. For example, Melbourne, Australia,
The columns are continuously monitored for soil moisture using two
which is a forest city, is planning on implementing GI to help com-
Meter Group’s TEROS 10 Sensors. Two sensors are placed within each
bat the worsening urban heat island effect (UHI) and could benefit
column at depths of 30 and 60 cm from the top of the bioretention
from a useful GI monitoring tool (Fuentes et al., 2021). Although
media. The sensors then gather the water volume content, namely
advancements are being made, this is still a new field, and many
soil moisture, in units of volume/volume (%v/v) every minute for 43
applications have yet to be explored. For instance, can GI managers
days (Anon, 2020). This time period was selected because: (1) The time
be informed of maintenance and inspection needs using these new
period was not too long as to make the time consuming methodology of
technology applications? This application represents how innovative
the column study impossible, and (2) a reasonable number of rainfall
and valuable IOT can be for the water industry (Bumblauskas et al.,
events occurred during this period allowing a realistic data set of soil
2017). As noted above, the application of IOT to the water industry
moisture patterns. While more data are obviously better, we believe
has matured which makes successful field deployment possible. Low-
this data set contained variable storm sizes and variable dry periods
cost sensor nodes may allow maintenance programs and GI inspections
between events and thus does give a reasonable data set to test the
more efficiency by measuring performance quantitatively and allowing
machine learning approaches herein. Following data collection, we pre-
year long assessments. However, determining what tools can be used to
process and cleanse the data by removing the missing data points
interpret large amounts of data while keeping an overall low cost has
after verifying that they are caused by sensor failure and/or sensor
yet to be determined.
anomalies.
This study conducts a proof-of-concept investigation of how data
To provide a robust analysis, there are four types of column config-
from sensors installed in bioretention can be analyzed and interpreted.
urations representing four potential bioretention design configurations.
The objective of this study is to: (1) understand how the soil moisture
There are five columns of each configuration for a total of 20 columns.
in different bioretention columns relate to each other when under the
Fig. 1 conveys how each configuration is constructed.
same conditions; (2) if the number of sensors used in a given installa-
tion can be optimized; (3) predict the future state of the bioretention – Free Draining (FD): the drain on the bottom of the column is
column using historical and future soil moisture and rainfall sensor completely unobstructed i.e., drains via gravity.
2
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
– Internal Water Storage (IWS): an upturned elbow in the pipe order to maintain water storage levels at 30 cm. For more details on
is attached to the drain on the bottom which creates a 45 cm the experimental method and column operation, refer to Persaud et al.
submerged zone at the bottom of the column. (2019).
– Soil Moisture (SM): uses a remote level controller with real-time
monitoring of water depth to control the soil moisture level at a 3. Methods
depth of 30 cm to maintain field capacity.
– Volume Control (VC): actively managed remotely in real-time In this study, we first use statistical techniques to establish the
by opening and closing the valves within the column to maintain differences between various time series collected, and then create three
a storage depth of 30 cm from the bottom of the bioretention experiments to examine the extent to which these data streams can
media as measured by a pressure transducer. be predicted accurately from historical/temporal data and/or other
independent sources (i.e., other sensors).
To create artificial stormwater for use in the study, tap water was To test for similarities among the data streams, we conduct analysis
supplemented with chemicals to achieve typical stormwater runoff of variance (ANOVA). This test particularly allows us to check whether
concentrations per Bratieres et al. (2008). Sediment collected from the data streams, in aggregate, statistically follow the same or different
a local detention pond was sieved and added to the mixture. The distributions. More specifically, the null hypothesis is that the means of
artificial stormwater was continuously mixed while being added to the data streams considered are statistically the same. Hence, rejecting
the columns to ensure even distribution of constituents among the the null hypothesis suggests that there are at least two group means
columns. (Bratieres et al., 2008a). (the detailed breakdown of the that are statistically significantly different from each other.
chemical concentrations is provided in Table 7 in the Appendix). We use two different modeling approaches in this study, namely,
Application of the stormwater is performed to mimic the size and ARIMA and LSTM. The ARIMA model is defined as follows,
frequency of real historical rainfall events across a roughly six week
period (July 31, 2019 to September 1, 2019) per data collected from 𝑦̂𝑡 = 𝜇 + 𝜙1 𝑦𝑡−1 + ⋯ + 𝜙𝑝 𝑦𝑡−𝑝 − 𝜃1 𝑒𝑡−1 − ⋯ − 𝜃𝑞 𝑒𝑡−𝑞 (1)
the McGhee Tyson airport located in Knoxville, TN. The correct volume where 𝑝 denotes the amount of the autoregressive (AR) term, 𝑞 is the
of water used for each application is calculated using these rainfall order of the moving average (MA) term, and 𝑑 is the number of non-
data and an assumed 20:1 watershed to practice area ratio. Overall, seasonal differences required to make it stationary (Ho and Xie, 1998).
this resulted in 14 events (Brown et al., 2009). To ensure that similar Furthermore, LSTM, which is a special type of recurrent neural net-
concentrations of stormwater pollutants were observed in all columns, works (RNNs) (Hochreiter and Schmidhuber, 1997; Wojciech Zaremba
applications were made in three passes. That is, 1/3 of the application and Vinyals, 2014), is defined as follows,
volume was applied to each column, followed by the next 1/3, etc.
This ensures that any settling that occurred in the tank used to make (Forget Gate) 𝑓𝑡 = 𝜎𝑔 (𝑊𝑓 𝑥𝑡 + 𝑈𝑓 ℎ𝑡− + 𝑏𝑓 )
1
the stormwater mixture would not be weighted unevenly to one group (Input Gate) 𝑖𝑡 = 𝜎𝑔 (𝑊𝑖 𝑥𝑡 + 𝑈𝑖 ℎ𝑡− + 𝑏𝑖 )
1
of columns. This process took 45 min per column. For the SM and
(Output Gate) 𝑜𝑡 = 𝜎𝑔 (𝑊𝑜 𝑥𝑡 + 𝑈𝑜 ℎ𝑡− + 𝑏𝑜 )
VC design configurations, which relied on preemptive control of the 1
system based on rainfall predictions, predictive data are obtained from (Cell State) 𝑐𝑡 = 𝑓𝑡 𝑐𝑡− + 𝑖𝑡 (𝜎𝑐 (𝑊𝑐 𝑥𝑡 + 𝑈𝑐 ℎ𝑡− + 𝑏𝑐 ))
1 1
the National Oceanic and Atmospheric Administration. If a rain event
(Hidden State) ℎ𝑡 = 𝑜𝑡 𝜎𝑐 (𝑐𝑡 )
is expected for a given day, the predicted rainfall depth is sent to
the columns the day prior via wireless communication. The columns where 𝑥𝑡 ∈ R𝑑 is the input vector to the LSTM unit (where 𝑑 refers to
would identify current sensor readings and decide if water should be the number of input features), and 𝑊 ∈ Rℎ×𝑑 , 𝑈 ∈ Rℎ×ℎ , and 𝑏 ∈ Rℎ
released to allow space for the predicted incoming rainfall. A Stevens are weight matrices and bias vector parameters (where ℎ refers to the
pressure transducer is used to measure the water storage levels for the number of hidden units). The forget gate decides what information can
VC configuration. The valves are triggered to drain or retain water in move to the next layers by disregarding any information not needed
3
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
where 𝑦𝑗 and 𝑦̂𝑗 are the response variable, the predicted value for
sample 𝑗, respectively (Willmott and Matsuura, 2005). Note that metric,
MAE, was particularly chosen because it produces a value representing
the average error, which is in the same units as the response variable
of interest. MAE has repeatedly used in the literature for similar model
evaluation purposes and provides a good measure for evaluating the
average model performance error (Diouf et al., 2015).
First, we conduct a comparison between ARIMA and LSTM. To do
so, we conduct an experiment where we predict future soil moisture
using past temporal data (with the data split of 70/30). For this
analysis, we report the MAE as well as the Nash–Sutcliffe efficiency
(NSE). NSE is used for time series models to calculate one minus the
ratio of the error variance divided by the variance of the actual. NSE
ranges from −∞ to 1, where the higher the NSE the better, i.e.,
∑𝑇 ( 𝑡 )2
𝑄 − 𝑄𝑡𝑚
𝑁𝑆𝐸 = 1 − 𝑡=1 ( 𝑜 )2 (3)
∑𝑇 𝑡
𝑡=1 𝑄𝑜 − 𝑄𝑜
4
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
Fig. 3. Comparison of internal water content as a function of time for example columns across all four configurations considered at 30 cm (left) and 60 cm (right).
Fig. 4. All five columns in the Free Draining (FD) configuration compared to each other at 30 cm (left) and 60 cm (right).
Fig. 5. All five columns in the Internal Water Storage (IWS) configuration compared to each other at 30 cm (left) and 60 cm (right).
slope) whereas the IWS is the slowest (flattest slope). Note that FD and compaction of the bioretention media (which could affect infiltration
SM are relatively similar over time, as seen in Fig. 3. dynamics), differences in capillary action among the columns, and/or
Figs. 4–7 provide comparisons of the columns under the same the presence of unintended preferential flow paths.
treatment (for the complete breakdown of each column within all four Fig. 4 compares each of the five FD columns at 30 cm versus
configurations over the 43 day period, see Table 9 in the Appendix). 60 cm. The plots, when compared to each other, can show the water
For instance, the IWS column 4 at 60 cm has a fairly low mean and flow throughout the column. For FD, the two plots maintain similar
column 5 has a relatively high mean in comparison to the first three patterns for each column, meaning the water is passing through each
columns. Similarly, columns 1–3 in the volume control configuration at column at a controlled rate with minimal restrictions due to clogging.
60 cm are all similar to each other, but columns 4 and 5 are different. Furthermore, it drains continuously over time solely due to gravity just
These outliers should be considered when interpreting the models as we would expect it to. FD columns act as one extreme where water
results because it would explain whether or not the predicted soil flows through the column relatively quick.
moisture is worse when using these columns for training. These outliers Fig. 5 shows IWS which is the configuration that acts as the other
can be caused by random discontinuous sensor faults, random noise, or extreme in comparison to FD. It purposefully slows and even stops
calibration errors (Baljak et al., 2012). Further, differences in column drainage so that ponding occurs at the top of the columns. IWS columns
construction could lead to these results, such as variable amounts of perform as expected for columns 1–3 by having around a 0.1%v/v
5
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
Fig. 6. All five columns in the Soil Moisture (SM) configuration compared to each other at 30 cm (left) and 60 cm (right).
Fig. 7. All five columns in the Volume Control (VC) configuration compared to each other at 30 cm (left) and 60 cm (right).
6
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
Table 3
Experiment I model performance results (MAE) for all column configurations [FD: Free Draining, VC: Volume Control, IWS:
Internal Water Storage, SM: Soil Moisture].
Train/test Column 1 Column 2 Column 3 Column 4 Column 5
Column 1 – 0.012 ± 0.005 0.011 ± 0.006 0.012 ± 0.004 0.030 ± 0.013
Column 2 0.024 ± 0.012 – 0.024 ± 0.012 0.022 ± 0.012 0.035 ± 0.018
FD Column 3 0.011 ± 0.003 0.014 ± 0.006 – 0.013 ± 0.005 0.042 ± 0.021
Column 4 0.011 ± 0.007 0.010 ± 0.006 0.012 ± 0.005 – 0.042 ± 0.021
Column 5 0.041 ± 0.023 0.041 ± 0.023 0.042 ± 0.021 0.041 ± 0.021 –
Column 1 – 0.026 ± 0.010 0.036 ± 0.021 0.006 ± 0.003 0.013 ± 0.006
Column 2 0.021 ± 0.011 – 0.026 ± 0.013 0.023 ± 0.009 0.012 ± 0.007
VC Column 3 0.023 ± 0.017 0.021 ± 0.008 – 0.028 ± 0.020 0.021 ± 0.011
Column 4 0.008 ± 0.004 0.030 ± 0.011 0.038 ± 0.023 – 0.015 ± 0.004
Column 5 0.017 ± 0.007 0.018 ± 0.009 0.034 ± 0.021 0.014 ± 0.004 –
Column 1 – 0.024 ± 0.011 0.028 ± 0.015 0.022 ± 0.009 0.081 ± 0.044
Column 2 0.038 ± 0.022 – 0.042 ± 0.022 0.035 ± 0.022 0.097 ± 0.059
IWS Column 3 0.023 ± 0.013 0.032 ± 0.011 – 0.023 ± 0.015 0.042 ± 0.026
Column 4 0.021 ± 0.007 0.018 ± 0.010 0.028 ± 0.016 – 0.065 ± 0.034
Column 5 0.039 ± 0.016 0.051 ± 0.022 0.056 ± 0.025 0.034 ± 0.014 –
Column 1 – 0.047 ± 0.027 0.038 ± 0.021 0.047 ± 0.024 0.028 ± 0.015
Column 2 0.033 ± 0.014 – 0.033 ± 0.013 0.051 ± 0.024 0.030 ± 0.014
SM Column 3 0.029 ± 0.013 0.029 ± 0.016 – 0.058 ± 0.024 0.037 ± 0.018
Column 4 0.025 ± 0.016 0.041 ± 0.022 0.036 ± 0.019 – 0.028 ± 0.024
Column 5 0.031 ± 0.014 0.058 ± 0.024 0.052 ± 0.021 0.044 ± 0.019 –
Table 4
Experiment II model performance results (MAE) for all column configurations [FD: Free Draining, VC: Volume Control,
SM: Soil Moisture, IWS: Internal Water Storage].
Train/test Column 1 Column 2 Column 3 Column 4 Column 5
Column 1 – 0.032 0.020 0.010 0.016
Column 2 0.029 – 0.012 0.028 0.022
FD Column 3 0.016 0.017 – 0.014 0.017
Column 4 0.023 0.045 0.033 – 0.044
Column 5 0.016 0.016 0.010 0.013 –
Column 1 – 0.067 0.047 0.013 0.063
Column 2 0.017 – 0.018 0.023 0.085
VC Column 3 0.045 0.062 – 0.034 0.092
Column 4 0.095 0.108 0.088 – 0.074
Column 5 0.139 0.161 0.011 0.033 –
Column 1 – 0.037 0.065 0.176 0.447
Column 2 0.021 – 0.052 0.190 0.439
IWS Column 3 0.053 0.053 – 0.223 0.402
Column 4 0.148 0.167 0.220 – 0.568
Column 5 0.439 0.431 0.384 0.626 –
Column 1 – 0.012 0.015 0.013 0.009
Column 2 0.015 – 0.024 0.016 0.016
SM Column 3 0.009 0.011 – 0.007 0.008
Column 4 0.034 0.034 0.035 – 0.035
Column 5 0.070 0.064 0.071 0.070 –
configuration. We conduct a grid search to determine the best param- 4.3. Model performance for experiments I–III
eter values for both ARIMA and LSTM models. For ARIMA, we use
𝑝 = 1, … , 8, 𝑞 = 1, … , 8, and 𝑑 = 1. For LSTM, we modify the number Table 3 presents the results for Experiment I, where we predict the
of neurons for the LSTM layer, ranging from 10 to 150 in increments 30 cm soil moisture sensor data stream in column 𝑥 using the 30 cm
soil moisture sensor data stream in column 𝑦 and rainfall, under all four
of 10, and the dropout rate for the Dropout layer, ranging from 0 to
configurations.
0.5 in increments of 0.1. The best parameters for ARIMA and LSTM are Row one shows which column is used to train the model, and
provided in Tables 10 and 11 in the Appendix. column 1 shows the column on which the model is tested. For instance,
Table 1 presents the results of the comparison between the ARIMA for FD configuration, an MAE of 0.024 ± 0.012 is the result of Exper-
and LSTM models in terms of MAE for each column under the FD iment I when column 2 is used for training and column 1 is used for
configuration. Similarly, Table 2 presents the results of the comparison testing. Overall, the MAEs are relatively small in every case suggesting
in terms of NSE. Overall, the results show that predictions are more that relationships between soil moisture levels at the same depth can
somewhat be captured using the LSTM model between columns with
accurate when using LSTM. LSTM generates substantially lower MAE
similar designs. Furthermore, this means it is possible to use sensor data
values (paired 𝑡-test 𝑝-value < 10−4 ), and higher NSE values. Note that
to monitor bioretention columns in the same environment (meaning
ARIMA produces negative NSE values, indicating that ARIMA is not they are composed of the same media, have a surrounding area with
an acceptable model for these data. Fig. 9 in the Appendix presents the same type of soil, and are receiving similar precipitation) and show
an example of this prediction for one of the columns. Based on these how they are functioning in real-time. This can range several blocks to
results, we only use LSTM in the remainder of this study. a few miles within a town or a city depending on how it is built.
7
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
Table 5
Experiment III (without future rainfall data) model performance results (MAE) for Free Draining configuration, 𝑡 minutes in the future.
𝑡 Train/test Column 1 Column 2 Column 3 Column 4 Column 5
Column 1 0.017 ± 0.009 0.018 ± 0.009 0.016 ± 0.008 0.016 ± 0.008 0.027 ± 0.019
Column 2 0.023 ± 0.012 0.021 ± 0.012 0.024 ± 0.014 0.027 ± 0.016 0.036 ± 0.019
30 Column 3 0.015 ± 0.008 0.018 ± 0.010 0.016 ± 0.008 0.018 ± 0.011 0.021 ± 0.011
Column 4 0.012 ± 0.006 0.016 ± 0.008 0.013 ± 0.006 0.016 ± 0.007 0.025 ± 0.011
Column 5 0.047 ± 0.026 0.044 ± 0.022 0.046 ± 0.023 0.047 ± 0.024 0.027 ± 0.015
Column 1 0.025 ± 0.013 0.024 ± 0.013 0.020 ± 0.011 0.022 ± 0.012 0.022 ± 0.013
Column 2 0.027 ± 0.013 0.026 ± 0.016 0.027 ± 0.015 0.028 ± 0.016 0.033 ± 0.017
60 Column 3 0.020 ± 0.014 0.023 ± 0.014 0.023 ± 0.012 0.022 ± 0.013 0.017 ± 0.011
Column 4 0.015 ± 0.009 0.018 ± 0.001 0.016 ± 0.008 0.021 ± 0.010 0.021 ± 0.010
Column 5 0.049 ± 0.026 0.048 ± 0.026 0.048 ± 0.026 0.048 ± 0.025 0.033 ± 0.018
Column 1 0.037 ± 0.022 0.034 ± 0.019 0.029 ± 0.017 0.030 ± 0.017 0.014 ± 0.009
Column 2 0.039 ± 0.021 0.041 ± 0.022 0.038 ± 0.020 0.041 ± 0.022 0.027 ± 0.015
120 Column 3 0.031 ± 0.019 0.031 ± 0.002 0.029 ± 0.017 0.029 ± 0.017 0.012 ± 0.008
Column 4 0.028 ± 0.014 0.028 ± 0.015 0.026 ± 0.013 0.031 ± 0.016 0.011 ± 0.007
Column 5 0.055 ± 0.029 0.057 ± 0.028 0.054 ± 0.027 0.052 ± 0.027 0.047 ± 0.025
Table 6
Experiment III (with future rainfall data) model performance results (MAE) for Free Draining configuration, 𝑡 minutes in the
future.
𝑡 Train/test Column 1 Column 2 Column 3 Column 4 Column 5
Column 1 0.017 ± 0.007 0.018 ± 0.008 0.013 ± 0.006 0.009 ± 0.004 0.031 ± 0.014
Column 2 0.016 ± 0.007 0.015 ± 0.008 0.014 ± 0.008 0.011 ± 0.005 0.032 ± 0.015
30 Column 3 0.018 ± 0.007 0.019 ± 0.011 0.017 ± 0.008 0.008 ± 0.004 0.031 ± 0.014
Column 4 0.018 ± 0.009 0.020 ± 0.010 0.016 ± 0.008 0.011 ± 0.004 0.031 ± 0.013
Column 5 0.008 ± 0.002 0.011 ± 0.006 0.008 ± 0.004 0.009 ± 0.004 0.017 ± 0.007
Column 1 0.022 ± 0.010 0.021 ± 0.010 0.018 ± 0.009 0.014 ± 0.006 0.035 ± 0.016
Column 2 0.021 ± 0.010 0.020 ± 0.009 0.019 ± 0.010 0.015 ± 0.007 0.035 ± 0.015
60 Column 3 0.020 ± 0.009 0.023 ± 0.013 0.022 ± 0.009 0.013 ± 0.006 0.032 ± 0.015
Column 4 0.021 ± 0.009 0.023 ± 0.012 0.019 ± 0.009 0.015 ± 0.006 0.033 ± 0.015
Column 5 0.008 ± 0.003 0.012 ± 0.006 0.008 ± 0.004 0.007 ± 0.004 0.024 ± 0.011
Column 1 0.028 ± 0.012 0.027 ± 0.014 0.025 ± 0.012 0.020 ± 0.008 0.039 ± 0.019
Column 2 0.028 ± 0.013 0.028 ± 0.013 0.024 ± 0.012 0.021 ± 0.009 0.039 ± 0.017
120 Column 3 0.028 ± 0.012 0.029 ± 0.016 0.027 ± 0.013 0.019 ± 0.008 0.036 ± 0.016
Column 4 0.027 ± 0.012 0.028 ± 0.015 0.025 ± 0.012 0.021 ± 0.010 0.035 ± 0.015
Column 5 0.010 ± 0.004 0.015 ± 0.009 0.012 ± 0.008 0.007 ± 0.004 0.030 ± 0.013
Table 4 shows the results for Experiment II, where the model Table 7
Sediment contributions and stormwater target concen-
predicts the 60 cm soil moisture sensor data stream in column 𝑥 using
trations.
the 30 cm soil moisture sensor data stream from column 𝑦 and rainfall, Source: Adopted from Bratieres et al. (2008b).
under all four configurations. Constituent Target Con-
For example, similar to Table 3, row one under FD presents the centration
results when the data from column 1 are used to predict the response (mg/L)
variable from column 𝑥, 𝑥 = 2, … , 5. However, care must be taken when 𝑁𝑂𝑥 − 𝑁 0.75
𝑁𝐻4+ − 𝑁 0.27
‘comparing’ the results of Experiments I and II. Recall that Experiment
𝑇 𝐷𝑃 0.04
I predicts the 30 cm soil moisture data stream in one column using 𝐶𝑢2+ 0.05
the 30 cm soil moisture in another column, whereas Experiment II 𝑍𝑛2+ 0.25
predicts the moisture data stream at two different depths. Hence, 𝑃 𝑏2+ 0.14
𝐶𝑟6+ 0.025
overall, the prediction task in Experiment I may be ‘easier’ than the 𝑀𝑛2+ 0.25
one in Experiment II, and that may be why the results are slightly 𝐹 𝑒3+ 1
better in Experiment I. However, the results from the two experiments 𝑁𝑖2+ 0.03
are not inherently comparable as one uses cross-validation and the 𝐶𝑑 2+ 0.0045
other not. This changes the amount of data used in training and the
evaluation process; hence, a direct comparison between results may not
be meaningful. 5. Conclusion and future work
Table 5 shows the MAE of Experiment III with the FD configuration
without future rainfall, and Table 6 shows the MAE of Experiment III This study investigates how accurately soil moisture levels can
with the FD configuration with future rainfall. The results show that be predicted in the future. The results suggest that if the dispersed
predictions are more accurate when future rainfall data is used in the bioretention cells are close enough to have similar climate conditions,
model. The results also suggest that this model can accurately predict you can use one cell to predict the state of multiple others up to one
the soil moisture sensor data stream in the future up to 60 min with hour in the future. To clarify, a column is similar to having a core of
the MAE averaging 0.020. Beyond 1 h, the model looses the desired a sandbox. A bioretention practice is an entire sandbox, and dispersed
relationship, and we can no longer accurately predict the soil moisture practices are like having sandboxes all over a playground. Hence, only
sensor data stream. However, as one can see in Fig. 8, even though the one column needs to have sensors at both 30 and 60 cm, and all
model can accurately predict the soil moisture sensor data stream up the surrounding columns only need one. This being said, stormwater
to 60 min in the future, it is most accurate between zero and 30 min. managers can view the status of all the bioretention GI practices in
8
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
Fig. 8. Results of Experiment III when predicting column 2 using column 1 with Free Draining configuration for 𝑡 ∈ {0, 30, 60, 120} min in the future.
Fig. 9. Predicted vs. actual soil moisture readings for column 5 with Free Draining configuration at 30 cm (left) and 60 cm (right).
real-time with the use of monitors instead of performing hands on Sensors measuring water depth, temperature, vegetation health, and
inspections. This will save time, money, and resources for the given other biogeochemical attributes can be leveraged to track performance
companies. of GI to aid in optimized management of these systems. The methods
In the future, we aim to create machine learning algorithms that herein show great promise in characterizing and predicting such data
use data streams from low-cost sensors to enable predicting the future streams.
state of GIs. These predictions will pave the way for a number of ca-
pabilities for future infrastructure design and maintenance. First, if soil Software and data availability
moisture can be accurately predicted based on historical data, changes
in soil moisture patterns can also be identified. These changes can be Developer and contact information: Kalina Scarbrough, kalinascar-
linked to the need for maintenance as clogging of filter media will [email protected]
be identifiable by a longer time to peak and more flat receding limb. Year first available: 2021
Identifying changes in these patterns can instigate further inspection Operating system: OSX, Windows, or Linux
by notifying municipal staff. Second, these predictions can be used to Software required: Python 3.6.0+, numpy 1.21.0+, scipy 1.7.0+,
understand how bioretention, and other GI, will respond to incoming keras 2.4.0+
rain events. This can potentially allow for real-time control based on Availability and online documentation: https://github.com/kscarb
these predictions. Finally, soil moisture measurements in bioretention r3/Real-Time-Sensor-Based-Prediction-of-Soil-Moisture-in-Green-Infras
are only one of many potential applications for these approaches. tructure-A-Case-Study.git
9
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
Table 8
ANOVA tables showing statistical differences between each column [SS: Sum-of-Squares, df: degrees of freedom, MS: Mean Squares, F: F-statistical, P: 𝑃 -value,
F crit: F-critical].
Source of Variation SS df MS F P F crit
FreeL
Between Groups 51.67 4 12.92 11394.35 0.00 2.37
Draining
Within Groups 353.05 311400 0.001
at 30 cm
Total 404.73 311404
InternalL Source of Variation SS df MS F P F crit
Water Between Groups 124.41 4 31.10 17228.27 0.00 2.37
Storage Within Groups 559.59 309960 0.002
at 30 cm Total 684.00 309964
Source of Variation SS df MS F P F crit
SoilL
Between Groups 77.28 4 19.32 16181.98 0.00 2.37
Moisture
Within Groups 361.83 303050 0.001
at 30 cm
Total 439.12 303054
Source of Variation SS df MS F P F crit
VolumeL
Between Groups 22.80 4 5.70 6609.23 0.00 2.37
Control
Within Groups 95.19 110368 0.001
at 30 cm
Total 117.99 110372
Source of Variation SS df MS F P F crit
FreeL
Between Groups 158.08 4 39.52 54119.39 0.00 2.37
Draining
Within Groups 226.64 310376 0.001
at 60 cm
Total 384.72 310380
InternalL Source of Variation SS df MS F P F crit
Water Between Groups 12379.38 4 3094.84 1290620.20 0.00 2.37
Storage Within Groups 741.26 309122 0.002
at 60 cm Total 13120.64 309126
Source of Variation SS df MS F P F crit
SoilL
Between Groups 375.09 4 93.77 33507.63 0.00 2.37
Moisture
Within Groups 848.08 303041 0.003
at 60 cm
Total 1223.17 303045
Source of Variation SS df MS F P F crit
VolumeL
Between Groups 136.83 4 34.21 11487.08 0.00 2.37
Control
Within Groups 328.64 110363 0.003
at 60 cm
Total 465.47 110367
Table 9
A statistical breakdown between columns in the same configuration over all 43 days.
Statistic Column 1 Column 2 Column 3 Column 4 Column 5
Mean 0.243 0.205 0.232 0.225 0.219
Free
Std 0.031 0.034 0.033 0.023 0.034
Draining
Min 0.190 0.150 0.130 0.150 0.160
30 cm Data
Max 0.410 0.400 0.410 0.330 0.350
Mean 0.223 0.194 0.229 0.264 0.216
Free
Std 0.025 0.032 0.029 0.029 0.019
Draining
Min 0.180 0.150 0.190 0.200 0.180
60 cm Data
Max 0.370 0.400 0.410 0.420 0.330
Mean 0.232 0.236 0.206 0.222 0.247
Volume
Std 0.033 0.028 0.034 0.034 0.037
Control
Min 0.170 0.120 0.000 0.110 0.190
30 cm Data
Max 0.360 0.370 0.330 0.330 0.440
Mean 0.252 0.225 0.247 0.329 0.329
Volume
Std 0.063 0.035 0.061 0.058 0.058
Control
Min 0.180 0.170 0.000 0.260 0.260
60 cm Data
Max 0.460 0.410 0.440 0.530 0.530
Mean 0.223 0.248 0.232 0.199 0.229
Soil
Std 0.032 0.033 0.036 0.043 0.027
Moisture
Min 0.110 0.190 0.170 0.040 0.120
30 cm Data
Max 0.390 0.420 0.430 0.330 0.340
Mean 0.235 0.226 0.207 0.273 0.305
Soil
Std 0.029 0.021 0.023 0.060 0.093
Moisture
Min 0.200 0.180 0.170 0.200 0.210
60 cm Data
Max 0.440 0.370 0.390 0.480 0.520
Internal Mean 0.277 0.243 0.239 0.217 0.229
Water Std 0.044 0.049 0.033 0.036 0.039
Storage Min 0.200 0.170 0.170 0.160 0.160
30 cm Data Max 0.480 0.430 0.430 0.470 0.440
Internal Mean 0.348 0.361 0.392 0.190 0.791
Water Std 0.054 0.019 0.053 0.041 0.650
Storage Min 0.220 0.320 0.250 0.120 0.230
60 cm Data Max 0.430 0.440 0.460 0.320 0.840
10
K. Scarbrough et al. Environmental Modelling and Software 162 (2023) 105638
Table 10 Barari, 2019. Deep Learning in Python: Training a Neural Network with Keras. SAGE
Best ARIMA parameters for free draining configuration at 30 cm Publications Ltd.
and 60 cm. Benedict, Mark A., McMahon, Edward T., et al., 2002. Green infrastructure: smart
Columns 𝑝 𝑑 𝑞 conservation for the 21st century. Renew. Resour. J. 20 (3), 12–17.
Bratieres, K., Fletcher, T.D., Deletic, A., Zinger, Y., 2008a. Nutrient and sediment
Column 1 8 1 7
removal by stormwater biofilters: A large-scale design optimisation study. Water
Column 2 6 1 0
Res. 42 (14), 3930–3940.
30 cm Column 3 7 1 8
Bratieres, Katia, Fletcher, Tim D., Deletic, Ana, Zinger, Yaron, 2008b. Nutrient and
Column 4 3 1 8
sediment removal by stormwater biofilters: a large-scale design optimisation study.
Column 5 7 1 0
Water Res..
Column 1 4 1 6 Brown, Robert A., Hunt, William Frederick, Kennedy, Shawn G., 2009. Designing Biore-
Column 2 1 1 2 tention with an Internal Water Storage (IWS) Layer. North Carolina Cooperative
60 cm Column 3 2 1 3 Extension.
Column 4 2 1 7 Bumblauskas, Daniel, Gemmill, Douglas, Igou, Amy, Anzengruber, Johanna, 2017.
Column 5 6 1 6 Smart maintenance decision support systems (SMDSS) based on corporate big data
analytics. Expert Syst. Appl. 90, 303–317.
Diouf, Abdoul Aziz, Brandt, Martin, Verger, Aleixandre, Jarroudi, Moussa El,
Table 11 Djaby, Bakary, Fensholt, Rasmus, Ndione, Jacques André, Tychon, Bernard, 2015.
Best LSTM parameters for Free Draining configuration Fodder biomass monitoring in sahelian rangelands using phenological metrics from
at 30 cm and 60 cm. FAPAR time series. Remote Sens. 7 (7), 9122–9148.
Layer Type Best parameter Feuer, Lenny, 1995. Soil moisture sensor. Google Patents.
Fuentes, Sigfredo, Tongson, Eden, Gonzalez Viejo, Claudia, 2021. Urban green in-
1 LSTM neurons = 150 frastructure monitoring using remote sensing from integrated visible and thermal
2 Dropout dropout rate = 0.5 infrared cameras mounted on a moving vehicle. Sensors 21 (1).
3 Dense neurons = 1 Gao, Jim, 2014. Machine learning applications for data center optimization.
Glasgow, Howard B., Burkholder, JoAnn M., Reed, Robert E., Lewitus, Alan J.,
Kleinman, Joseph E., 2004. Real-time remote monitoring of water quality: a review
of current applications, and advancements in sensor, telemetry, and computing
Declaration of competing interest technologies. J. Exp. Mar. Biol. Ecol. 300, 409–448.
Ho, S.L., Xie, M., 1998. The use of ARIMA models for reliability forecasting and
analysis. Comput. Ind. Eng. 35 (1), 213–216.
The authors declare that they have no known competing finan-
Hochreiter, Sepp, Schmidhuber, Jürgen, 1997. Long short-term memory. Neural
cial interests or personal relationships that could have appeared to Comput. 9 (8), 1735–1780.
influence the work reported in this paper. Kaluarachchi, Yamuna, 2020. Potential advantages in combining smart and green
infrastructure over silo approaches for future cities. Front. Eng. Manag..
Data availability Kerkez, Branko, Gruden, Cyndee, Lewis, Matthew, Montestruque, Luis, Quigley, Marcus,
Wong, Brandon, Bedig, Alex, Kertesz, Ruben, Braun, Tim, Cadwalader, Owen,
Poresky, Aaron, Pak, Carrie, 2016. Smarter stormwater systems. Environ. Sci.
Data will be made available on request. Technol. 50 (14), 7267–7273, PMID: 27227574.
Labib, S.M., 2019. Investigation of the likelihood of green infrastructure (GI) enhance-
Acknowledgments ment along linear waterways or on derelict sites (DS) using machine learning.
Environ. Modell. Softw. 118, 146–165.
Persaud, P.P., Akin, A.A., Kerkez, B., Mccarthy, D.T., Hathaway, J.M., 2019. Real time
This research is partially supported by the University of Tennessee’s control schemes for improving water quality from bioretention cells. Blue-Green
Institute for a Secure and Sustainable Environment (ISSE), United Syst. 1 (1), 55–71.
States and the National Science Foundation, United States Grant CNS- Ryan, Michael J., 2019. Dynamic green infrastructure: Monitoring long-term changes
1737432. and improving performance with real-time control and machine learning. In:
ProQuest Dissertations and Theses. p. 209, Copyright - Database copyright ProQuest
LLC; ProQuest does not claim copyright in the individual underlying works; Last
Appendix updated - 2020-10-09.
Shelatkar, Tejas, Tondale, Stephen, Yadav, Swaraj, Ahir, Sheetal, 2020. Web traffic time
See Tables 7–11 and Fig. 9. series forecasting using ARIMA and LSTM RNN. ITM Web Conf. 32, 03017.
Shivani, Sandhu, K.S., Ramachandran Nair, Anil, 2019. A comparative study of ARIMA
and RNN for short term wind speed forecasting. In: 2019 10th International
Conference on Computing, Communication and Networking Technologies. ICCCNT,
References pp. 1–7.
Siami-Namini, Sima, Tavakoli, Neda, Siami Namin, Akbar, 2018. A comparison of
Angelstam, Per, Pedersen, Simen, Manton, Michael, Garrido, Pablo, Naumov, Vladimir, ARIMA and LSTM in forecasting time series. In: 2018 17th IEEE International
Elbakidze, Marine, 2017. Green infrastructure maintenance is more than land cover: Conference on Machine Learning and Applications. ICMLA, pp. 1394–1401.
Large herbivores limit recruitment of key-stone tree species in Sweden. Landsc. Willmott, Cj, Matsuura, K., 2005. Advantages of the mean absolute error (MAE) over
Urban Plan. 167, 368–377. the root mean square error (RMSE) in assessing average model performance. Clim.
Anon, 2020. TEROS 10: Simple soil moisture sensing. METER. Res. 30, 79–82.
Atzori, Luigi, Iera, Antonio, Morabito, Giacomo, 2010. The internet of things: A survey. Wojciech Zaremba, Ilya Sutskever, Vinyals, Oriol, 2014. Recurrent neural network
Comput. Netw. 54 (15), 2787–2805. regularization. CoRR, arXiv:1409.2329.
Baljak, Valentina, Tei, Kenji, Honiden, Shinichi, 2012. Classification of faults in sensor Zeybek, Melis, 2018. Nash-sutcliffe efficiency approach for quality improvement. J.
readings with statistical pattern recognition. Appl. Math. Comput. 2 (11), 496–503.
11