Chatzidimitriou Etal 2006 AMS PDF
Chatzidimitriou Etal 2006 AMS PDF
Chatzidimitriou Etal 2006 AMS PDF
Mark DeMaria
NOAA/NESDIS, Ft. Collins, Colorado
1. Introduction
Based on the tropical cyclone (TC) forecasting literature, interpretable results regarding possible multiple
hurricane intensity prediction is one of the most interdependencies of the inputs and the output. In
challenging tasks. To date, the problem has been contrast, rule based methods are not only competitive
addressed through statistical models (SHIFOPR, with respect to prediction performance, but also support
ST5D), statistical-dynamical models (SHIPS) and the capability of discovering multiple correlations in the
primitive equation numerical models (GFDL), predicting dataset in an easy to read and validate manner. This
the intensity changes for up to five days. For the first could potentially aid the analytical formulation of the
two kinds of models, both multiple linear regression problem.
(MLR) and non-linear regression in the form of neural
networks (NNs), have been applied with promising 2. Data and Predictors
results (DeMaria et al. 2005, Castro 2004, Knaff et al.
2004, Baik and Hwang 2000, Baik and Hwang 1998). The present work is based on the predictors used to
On the other hand, the procedures for feature derive the Statistical Hurricane Intensity Prediction
selection and for reporting the predictive performance of Scheme or SHIPS from now on (DeMaria et al. 2005).
the derived models have not been investigated to a A total of 37 variables were used to predict the intensity
great extent, in the sense that (1) they widely vary, so changes measured as maximum sustained 1-minute
comparisons between models are made in an ad-hoc surface winds. The variables can be found in Table A.1.
basis; (2) the derived models have an inherent selection They include climatology, persistence and synoptic
bias, i.e. allowance to peek in the test set during feature parameters. The SHIPS model is based on the ``perfect
selection, prohibiting good generalization behavior prog'' approach, meaning its predictors are calculated
(Ambroise and McLachlan 2002); and (3) they are based on the ``best-track'' data, prepared (post-
unstable in terms of performance and understanding processed) by the National Hurricane Center (NHC).
(Guyon and Elisseeff 2003). For example, it is often the Since the focus is on using alternative methodologies
case that at certain seasons the models perform and on measuring the accuracy of the derived models,
extremely well and in others quite unsatisfactory, while rather than estimating their operational skills, the
there is a constant update in the set of features used, perfect prog model is well suited. Some of the
lowering the interpretability of the models. predictors are denoted as static (S) i.e. they are
Having the above in mind, the goal of this paper evaluated only at time t=0 and the same value is used
is twofold: (a) to build robust models; and (b) build for each time interval, while others as time dependent
models that are explicitly or implicitly interpretable, (T), which are averaged along the storm track from t=0
delivering additional knowledge about the problem. to the forecast interval. The reader is referred to Table
Robustness in this context can be defined as a property A.1. for a categorization of the predictors as static or
of a model that is performing efficiently, is able to time-dependent. The training set was be the full set of
generalize well and is parsimonious a characteristic of SHIPS predictors (Table A.1) for the periods of 1982 to
models that generalize well based on what Ockham's 2003. The 2004 season was used as a test set. When a
razor principle implies: complexity (in our case extra validation set was need, a partition from the training set
features) must pay for itself by giving a significant was cut, avoiding peeking on the test set.
improvement in the error rate during the training
procedure (Cristianini and Shawe-Taylor 2000). This 3. Model Development
principle is quantified in section 3.
Recently developed rule based regression
schemes are also a focus of the work presented here. 3.1. Learning Curves
They are applied to the dataset in order to identify more This section describes the procedure used in this study
elaborate structure behind the intensity predictions. to produce robust models for predicting accurately the
MLR and NNs fail to provide the human expert with intensification of TCs. The methods are presented in the
order applied to the dataset, since their outcomes guide
the next steps.
* Corresponding author address: Charles W. Anderson, The first method applied was that of learning
Colorado State Univ., Dept. Of Computer Science, Fort curves (LCs), plots graphing the performance measure
Collins, CO, 80523 USA of an learning algorithm, y-axis, versus the number of
training examples, x-axis (Russel and Norving 2003). without peeking the test set and biasing our selection.
With LCs one is able to (1) detect if there is a pattern Several model assessment methods have been
governing the data that can be learned; (2) decide evaluated: training error, 5 3-fold cross-validation (CV)
whether the data are sufficient to build robust models; procedures, each one with a new random selection,
(3) identify the number of folds (k) for performing a k- their average, leave-one-season-out (LOSO) jacknife
fold cross-validation procedure; and (4) learn something procedure, leave-one-hurricane-out (LOHO) jacknife
about the noise laying on the data. procedure and .632+ bootstrap method (B.632+)
In Figures 1 and 2 the learning curves for repeated 5 times (Tibshirani et al 2002). The methods
forecasts at 6 and 120 hours ahead respectively are were applied to the training set to provide the predicted
presented. The particular LCs were produced using generalization error. Their performance was evaluated
MLR on all the predictors. Also, for each one of the based on their accuracy to predict the actual
percentages, 10 evaluations with random selections generalization error, provided by the dataset using the
from the original dataset were made in order to caclulate formula:
the standard errors (se) as well. The first observation is
that the data are sufficient for building robust models, predictedGE actualGE
P= % (1)
since there is no significant improvement over the mean actualGE
of the root mean square error (RMSE) of the predicted
versus the actual intensity changes as more samples
are being added. Moreover, the decrease of the mean
and variance of the RMSE suggests that there is pattern
to be learned. After examining all the 20 LCs (from
predicting intensity changes 6 hours ahead to predicting
120 hours ahead, every 6 hours) for several random
runs, a good value for k was found to be 3
(corresponding to 66-33% split for training and testing
for each of the three validations). Finally, it is evident
that as one moves to predictions further ahead in the
future (for example 6 hours vs. 120 hours), the noise in
the data increases, making predictions less accurate.
LOSO 14.63 10.98 6.84 8.34 Figure 3. The 1 se rule displayed for backward
elimination selection procedure. In this particular
LOHO 21.43 14.68 12.89 12.64 experiment (12 hours ahead), only one feature is
selected.
B.632+ 22.24 38.25 7.69 22.38
Table 2 presents the performance of the FS
techniques as an average over all 20 datasets, while
As mentioned earlier the Ockham's razor Table A.1 has the features selected by each one of the
principle is applied. For picking a particular set of techniques. For the best method (genetic algorithm),
features, we selected the one that is the smallest and non-linear features were added and the technique was
which has the maximum LOSO error, 1 standard error re-applied (named as GN1, GN2). The performance
above the minimum LOSO error (Hastie et al 2001). The measure overall is based on both the actual GE and the
intuition behind the 1 standard error threshold is that the difference between the actual GE and the estimated
models should not be more complicated unless their one:
performance decreases (in our case the RMSE
increases) by at least 1 standard error from the predictedGEactualGE
minimum. Figure 3 displays this rule for backward O=actualGE (3)
actualGE
elimination . The method is applied for the B, F, L and N
methods. The F and B methods are adding and From the FS techniques the genetic algorithm
removing variables based on the F-statistic, while the N procedure outperformed all the other techniques.
method is based on a heuristic that exploits the weights Especially, the set derived from GN1 is not only the best
and the network structure of the NN. The GA tries to with respect to LOSO error, but also has the minimum
optimize the function: number of features from the best performing methods.
From Table A.1 one can see that VMAX, INCV, POT,
SHRD, Z850 and LSHR are selected by at least 5
procedures, while SHIPS uses non-linear combinations Table 3 Ranking the feature selection methods based
of the three most selected features. SHIPS outperforms on their mean RMSE performance on all 20 datasets.
all other methods for the 2004 test set (and of course Hour SHIPS GN1 GN2 RF RBA NN SVM
overall), but it is quite possible that the particular season
was appropriate for the SHIPS predictors. Training and 6 5.38 5.43 5.43 5.33 5.99 5.39 5.48
testing for more seasons will be performed later in the
study. The main conclusion drawned by this section is 12 8.38 8.58 8.55 8.26 9.83 8.23 8.55
the fact that there is a small set of features that is
required to obtain good error rates. Additionally, the fact 18 10.70 11.06 11.04 10.85 13.46 10.68 11.57
that genetic algorithmic procedures found better subsets
can be attributed to their capability of making a better 24 12.64 13.16 13.10 12.84 16.08 12.59 14.29
search in the space of features, identifying possible
reduntant variables that help each other and finding 30 14.14 14.89 14.82 14.06 19.87 14.47 16.18
variables useless by themselves, but usefull with other
combinations (Guyon and Elisseeeff 2003). On the 36 15.43 16.37 16.31 15.09 20.04 15.91 17.86
contrary backward and forward elimination are brute
force approaches, removing less important variables 42 16.57 17.45 17.46 17.02 21.77 17.39 19.05
without establishing evidence that even with the help of
others, their contribution is minimal. 48 17.6 18.37 18.55 17.55 24.2 18.84 19.94
Table 2. Ranking the feature selection methods based 54 18.69 19.44 19.74 19.32 27.34 19.98 20.49
on their mean RMSE performance on all 20 datasets.
The asterisks denote models with non-linear features, 60 19.56 20.43 20.87 18.22 27.7 21.18 20.87
while the number in the parenthesis present the number
of the linear features. 66 20.2 21.45 21.92 20.06 26.04 22.09 21.76
Methods LOSO 2004 Overall Selected
72 20.81 22.37 22.81 21.25 31.54 23.16 21.98
Full 17.85 21.00 24.13 37
78 21.29 23.06 23.59 22.06 30.85 24.23 21.67
F 18.02 19.8 21.57 13
84 21.66 23.62 24.28 21.32 31.17 24.95 21.3
B 18.76 20.23 21.7 4
90 21.79 23.99 24.91 21.21 31.11 25.44 21.62
Lasso 23.82 24.68 26.29 4
96 21.88 24.31 25.45 22.41 31.71 25.91 22.03
G 17.41 19.13 20.86 12
102 21.73 24.57 25.97 21.14 34.47 26.12 21.9
GN 1* 16.80 19.17 21.33 10 (8)
108 21.56 24.82 26.36 23.88 30.84 27.23 21.04
GN 2* 16.85 19.76 22.66 19 (13)
114 21.27 24.97 26.73 22.33 30.26 25.48 20.84
N 19.19 21.1 23.02 7
120 21.12 25.14 27.31 19.19 31.00 26.00 21.32
SHIPS* 17.41 17.62 18.65 16 (13)
Mean 17.62 19.17 19.76 17.66 24.76 19.76 18.49
B1. 6 hours
RuleFit:
Rule 1: 0.303 * INCV
Rule 2: if (-0.5 <= INCV <= +Inf) and (40.51 <= POT <= +Inf)
then increase intensity change by 0.674
Rule 3: if (-3 <= INCV <= +Inf) and (-Inf <= SHRG <= 25.67)
then increase intensity change by 0.4923
Rule 4: -0.00322 * E000
Rule 5: if (-Inf <= INCV <= 2) and (-Inf <= POT <= 99.64)
then decrease intensity change by 0.4695
Rule 6: if (-Inf <= ENSS <= 19) and (-Inf <= SHRD <= 9.475) and (-Inf <= POT <= 111.7)
then increase intensity change by 0.646
Rule 7: if (-7.25 <= REFC <= +Inf) and (29.9 <= POT <= +Inf)
then increase intensity change by 0.6681
Rule 8: if (-0.225 <= U200 <= +Inf) and (-Inf <= T200 <= -55.2) and (-Inf <= ENSS <= 41.25)
then increase intensity change by 0.7075
Rule 9: if (-1.5 <= INCV <= 3) and (-Inf <= POT <= 100.1)
then increase intensity change by 0.3703
Rule 10: if (-Inf <= VMAX <= 92.5) and (-Inf <= INVC <= -2.5) and (-Inf <= SPDY <= 0.4439)
then increase intensity change by 0.5261
RBA:
if VMAX='(-inf-30.5]' and T200='(-inf--51.625]' => increase by 1.1587
if VMAX='(-inf-30.5]' => increase by 1.1869
if INCV='(-0.5-0.5]' and T200='(-inf--51.625]' and SHRD='(-inf-18.925]' => increase by 1.117
if INCV='(-0.5-0.5]' and T200='(-inf--51.625]' and SPDX='(-inf--0.970603]' => increase by 0.8974
if INCV='(-0.5-0.5]' and SHRD='(-inf-18.925]' => increase by 1.035
if INCV='(-0.5-0.5]' and SPDX='(-inf--0.970603]' => increase by 0.7985
if INCV='(-0.5-0.5]' and LAT='(15.85-34.75]' T200='(-inf--51.625]' => increase by 0.7169
if INCV='(-0.5-0.5]' and T200='(-inf--51.625]' => increase by 0.6783
if INCV='(-0.5-0.5]' => increase by 0.5926
if SPDX='(-inf--0.970603]' and POT='(102.968402-inf)' => increase by 1.7557
B2. 120 hours
RuleFit:
Rule 1: 0.2532 * POT
Rule 2: if (42.5 <= VMAX <= +Inf) and (80.45 <= EPOS <= 150.6) and (13.25 <= SHRD <= 38.49)
then decrease intensity change by 6.239
Rule 3: if (-Inf <= VMAX <= 69) and (7.857 <= Z850 <= +Inf)
then increase intensity change by 5.797
Rule 4: if (-0.4143<= U200 <= +Inf) and (-Inf <= EPSS <= 74.71) and (-Inf <= ENSS <= 14.07)
then decrease intensity change by 6.194
Rule 5: if (2.578 <= LSHR <= +Inf) and (-Inf <= POT <= 72.64)
then decrease intensity change by 5.630
Rule 6: if (-Inf <= VMAX <= 77.5) and (103.1 <= EPOS <= +Inf) and (-Inf <= SHRG <= 23.05)
then increase intensity change by 5.381
Rule 7: if (-Inf <= EPOS <= 152.4) and (-Inf <= RHHI <= 45.31) and (20.88 <= SHRG <= +Inf)
then decrease intensity change by 6.465
Rule 8: if (-Inf <= T200 <= -51.81) and (-Inf <= SHRG <= 16.78)
then increase intensity change by 7.401
Rule 9: if (-Inf <= VMAX <= 87.5) and (14.12 <= ENSS <= +Inf) and (0.4048 <= REFC <= +Inf)
and (-Inf <= Z000 <= 83.5) and (3.662 <= LSHR <= +Inf)
then decrease intensity change by 5.159
Rule 10: if (-Inf <= VMAX <= 87.5) and (3499 <= E000 <= +Inf) and (146.5 <= SHTS <= +Inf)
and (-1.405 <= REFC <= +Inf)
then decrease intensity change by 5.641
RBA:
if POT='(30.594633-72.991362]' => increase by -10.1508
if VMAX='(-inf-41.5]' and SHRD='(14.330952-inf)' and SHRG='(16.802381-inf)' => increase by 19.2896
if VMAX='(-inf-41.5]' and SHRD='(14.330952-inf)' => increase by 19.3589
if LAT='(17.25-inf)' and POT='(72.991362-inf)' => increase by 19.1011
if VMAX='(-inf-41.5]' and SHRG='(16.802381-inf)' and LSHR='(3.887604-inf)' => increase by 19.9903
if VMAX='(41.5-72.5]' and LSHR='(3.887604-inf)' => increase by 1.7391
if SHRD='(14.330952-inf)' and SHRG='(16.802381-inf)' and POT='(72.991362-inf)' and LSHR='(3.887604-inf)' =>
increase by 14.5526
if VMAX='(-inf-41.5]' and SHRG='(16.802381-inf)' => increase by 22.7259
if SHRD='(14.330952-inf)' and SHRG='(16.802381-inf)' and POT='(72.991362-inf)' => increase by 15.4955
if SHRG='(16.802381-inf)' and POT='(72.991362-inf)' and LSHR='(3.887604-inf)' => increase by 16.0543