IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
765
Multiresolution Forecasting for Futures Trading
Using Wavelet Decompositions
Bai-Ling Zhang, Richard Coggins, Member, IEEE, Marwan Anwar Jabri, Senior Member, IEEE, Dominik Dersch,
and Barry Flower, Member, IEEE
Abstract—In this paper, we investigate the effectiveness of a
financial time-series forecasting strategy which exploits the multiresolution property of the wavelet transform. A financial series
is decomposed into an over complete, shift invariant scale-related
representation. In transform space, each individual wavelet series
is modeled by a separate multilayer perceptron (MLP). To better
utilize the detailed information in the lower scales of wavelet coefficients (high frequencies) and general (trend) information in the
higher scales of wavelet coefficients (low frequencies), we applied
the Bayesian method of automatic relevance determination (ARD)
to choose short past windows (short-term history) for the inputs
to the MLPs at lower scales and long past windows (long-term
history) at higher scales. To form the overall forecast, the individual forecasts are then recombined by the linear reconstruction
property of the inverse transform with the chosen autocorrelation
shell representation, or by another perceptron which learns the
weight of each scale in the prediction of the original time series.
The forecast results are then passed to a money management
system to generate trades. Compared with previous work on
combining wavelet techniques and neural networks to financial
time-series, our contributions include 1) proposing a three-stage
prediction scheme; 2) applying a multiresolution prediction which
is strictly based on the autocorrelation shell representation, 3)
incorporating the Bayesian technique ARD with MLP training
for the selection of relevant inputs; and 4) using a realistic money
management system and trading model to evaluate the forecasting
performance. Using an accurate trading model, our system
shows promising profitability performance. Results comparing
the performance of the proposed architecture with an MLP
without wavelet preprocessing on 10–year bond futures indicate
a doubling in profit per trade ($AUD1753:$AUD819) and Sharpe
ratio improvement of 0.732 versus 0.367, as well as significant
improvements in the ratio of winning to loosing trades, thus
indicating significant potential profitability for live trading.
Index Terms—Autocorrelation shell representation, automatic
relevance determination, financial time series, futures trading,
multilayer perceptron, relevance determination, wavelet decomposition.
Manuscript received August 1, 2000; revised February 5, 2001. This work was
supported by the Australian Research Council and Crux Financial Engineering
Pty. Ltd.
B.-L. Zhang and R. Coggins are with the Computer Engineering Laboratory
(CEL), School of Electrical and Information Engineering, University of Sydney,
NSW 2006, Australia.
M. A. Jabri is with the Computer Engineering Laboratory (CEL), School
of Electrical and Information Engineering, University of Sydney, NSW 2006,
Australia and also with the Electrical and Computer Engineering Department,
Oregon Graduate Institute, Beaverton, OR 97006 USA.
D. Dersch and B. Flower are with the Research and Development, Crux Financial Engineering, NSW 1220, Australia.
Publisher Item Identifier S 1045-9227(01)05016-0.
I. INTRODUCTION
D
URING the last two decades, various approaches have
been developed for time series prediction. Among them
linear regression methods such as autoregressive (AR) and autoregressive moving average (ARMA) models have been the
most used methods in practice [18]. The theory of linear models
is well known, and many algorithms for model building are
available.
Linear models are usually inadequate for financial time series as in practice almost all economic processes are nonlinear
to some extent. Nonlinear methods are widely applicable nowadays with the growth of computer processing speed and data
storage. Of the nonlinear methods, neural networks have become very popular. Many different types of neural networks
such as MLP and RBF have been proven to be universal function approximators, which make neural networks attractive for
time series modeling, and for financial time-series forecasting
in particular.
An important prerequisite for the successful application of
some modern advanced modeling techniques such as neural networks, however, is a certain uniformity of the data [14]. In most
cases, a stationary process is assumed for the temporally ordered
data. In financial time series, such an assumption of stationarity
has to be discarded. Generally speaking, there may exist different kinds of nonstationarities. For example, a process may be
a superposition of many sources, where the underlying system
drifts or switches between different sources, producing different
dynamics. Standard approaches such as AR models or nonlinear
AR models using MLPs usually give best results for stationary
time series. Such a model can be termed as global as only one
model is used to characterize the measured process. When a series is nonstationary, as is the case for most financial time series,
identifying a proper global model becomes very difficult, unless
the nature of the nonstationarity is known. In recent years, local
models have grown in interest for improving the prediction accuracy for nonstationary time series [25].
To overcome the problems of monolithic global models, another efficient way is to design a hybrid scheme incorporating
multiresolution decomposition techniques such as the wavelet
transform, which can produce a good local representation of the
signal in both the time domain and the frequency domain [13].
In contrast to the Fourier basis, wavelets can be supported on an
arbitrarily small closed interval. Thus, the wavelet transform is
a very powerful tool for dealing with transient phenomena.
There are many possible applications of combining wavelet
transformations into financial time-series analysis and forecasting. Recently some financial forecasting strategies have
been discussed that used wavelet transforms to preprocess the
1045–9227/01$10.00 © 2001 IEEE
766
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
data [1], [2], [19], [27]. The preprocessing methods they used
are based on the translation invariant wavelet transform [7] or
à trous wavelet transform [4], [23].
In this work, we have developed a neuro-wavelet hybrid
system that incorporates multiscale wavelet analysis into a
set of neural networks for a multistage time series prediction.
Compared to the work in [11], our system exploits a shift
invariant wavelet transform called the autocorrelation shell
representation (ASR) [4] instead of the multiscale orthogonal
wavelet transform as was originally presented in [13]. It is
cumbersome to apply the commonly defined DWT for real-time
time series applications due to the lack of shift invariance,
which plays an important role in time series forecasting. Using
a shift invariant wavelet transform, we can easily relate the
resolution scales exactly to the original time series and preserve
the integrity of some short-lived events [2].
Basically, we suggest the direct application of the à trous
wavelet transform based on the ASR to financial time series and
the prediction of each scale of the wavelet’s coefficients by a
separate feedforward neural network. The separate predictions
of each scale are proceeded independently. The prediction results for the wavelet coefficients can be combined directly by
the linear additive reconstruction property of ASR, or preferably, as we propose in this paper, by another NN in order to
predict the original time series. The aim of this last network is
to adaptively choose the weight of each scale in the final prediction [11]. For the prediction of different scale wavelet coefficients, we apply the Bayesian method of automatic relevance
determination (ARD) [16] to learn the different significance of
a specific length of past window and wavelet scale. ARD is a
practical Bayesian method for selecting the best input variables,
which enables us to predict each scale of wavelet coefficients
by an appropriate neural network, thus simplifying the learning
task as the size of each network can be quite small.
Comparing the previous work on applying wavelet techniques together with connectionist methods to financial time
series in [1], [2] our contributions consist of 1) applying
some three-stage prediction schemes; 2) a multiresolution
prediction which is strictly based on the autocorrelation shell
representation; 3) selecting relevant MLP inputs from the
overcomplete shell representation using the Bayesian technique
ARD; and 4) demonstrating performance using a realistic
money management system and trading model.
This paper is organized as follows. In the next section, we
briefly describe the wavelet transform and the autocorrelation
shell representation. The principle of the Bayesian method of
ARD is also introduced. Section III presents our hybrid neurowavelet scheme for time-series prediction and system details.
The simulation results and performance comparison over different data sets using a realistic trading simulator are summarized in Section IV followed by discussions and conclusions in
Section V.
II. COMBINING BAYESIAN AND WAVELET BASED
PREPROCESSING
tion defined on the whole real line, then, for a suitably chosen
mother wavelet function , we can expand as
(1)
are all orthogonal to one another.
where the function
conveys information about the behavior of
The coefficient
the function concentrating on effects of scale around
near time
. This wavelet decomposition of a function is
closely related to a similar decomposition [the discrete wavelet
transform (DWT)] of a signal observed at discrete points in time.
The DWT has the property of being very good at compressing
a wide range of signals actually observed in practice—a very
large proportion of the coefficients of the transform can be set
to zero without appreciable loss of information, even for signals that contain occasional abrupt changes of level or other
behavior. It is this ability to deal with heterogeneous and intermittent behavior that makes wavelets so attractive. Classical
methods of signal processing depend on an underlying notion
of stationarity, for which methods such as Fourier analysis are
very well adapted.
One problem with the application of the DWT in time-series
analysis is that it suffers from a lack of translation invariance.
This means that statistical estimators that rely on the DWT are
sensitive to the choice of origin. This problem can be tackled by
means of a redundant or nondecimated wavelet transform [7],
[21]. A redundant transform based on an -length input time series has an -length resolution scale for each of the resolution
levels of interest. Hence, information at each resolution scale is
directly related at each time point. To accomplish this, we use
an à trous algorithm for realizing shift-invariant wavelet transforms, which is based on the so-called autocorrelation shell representation [21] by utilizing dilations and translations of the autocorrelation functions of compactly supported wavelets. The
filters for the decomposition process are the autocorrelations of
the quadrature mirror filter coefficients of the compactly supported wavelets and are symmetric.
By definition, the autocorrelation functions of a compactly
and the corresponding wavelet
supported scaling function
are as follows:
(2)
and
The family of functions
, where
and
, is called an autocorand
relation shell. Then a set of filters
can be defined as
A. Discrete Wavelet Transform and Autocorrelation Shell
Representation
Generally speaking, a wavelet decomposition provides a way
of analysing a signal both in time and in frequency. If is a func-
(3)
ZHANG et al.: MULTIRESOLUTION FORECASTING FOR FUTURES TRADING
767
Fig. 1. Illustration of the procedure for preparing data in the hybrid neuro-wavelet prediction scheme. Note that each time a segment of the time series is
transformed, only the last coefficient is retained.
Using the filters and , we obtain the pyramid algorithm
for expanding into the autocorrelation shell
(4)
, for Daubechies’s
As an example of the coefficients
, the coeffiwavelets with two vanishing moments and
cients are
.
A very important property of the autocorrelation shell coefficients is that signals can be directly reconstructed from them.
Given a smoothed signal at two consecutive resolution levels,
the detailed signal can be derived as
(5)
Then the original signal
the coefficients
can be reconstructed from
and residual
gives an increasingly more accurate approximation of the original signal. The additive form of reconstruction allows one to
combine the predictions in a simple additive manner.
To make predictions we must make use of the most recent
data. To deal with this boundary condition we use the time-based
à trous filters algorithm proposed in [2], which can be briefly described as follows. Consider a signal
, where
is the present time-point and perform the following steps.
1) For index sufficiently large, carry out the à trous transform (4) on
using a mirror extension of the signal when the filter extends beyond .
2) Retain the coefficient values as well as the
th time-point only:
residual values for the
. The summation of
.
these values gives
and return to Step 1).
3) If is less than , set to
This process produces an additive decomposition of the signal
, which is similar to the à trous wavelet
. The
transform decomposition on
algorithm is further illustrated in Fig. 1.
B. Application of Automatic Relevance Determination (ARD)
(6)
, where
is the final smoothed signal.
for
At each scale , we obtain a set of coefficients
. The
wavelet scale has the same number of samples as the signal, i.e.,
provide a “residual”
it is redundant. The set of values of
to this, for
,
or “background.” Adding
When applying neural networks to time series forecasting,
it is important to decide on an appropriate size for the
time-window of inputs. This is similar to a regression problem
in which there are many possible input variables, some of
which may be less relevant or even irrelevant to the prediction
of the output variable. For a finite data set, there may exist
some random correlations between the irrelevant inputs and
the output, making it hard for a conventional neural network
to set the coefficients for useless inputs to zero. The irrelevant
768
Fig. 2.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
Overview of the wavelet/neural net multiresolution forecasting system. w
inputs however will degrade the model’s performance. The
ARD method [16] gives us a principled way for choosing the
length of past windows to train neural networks. In our hybrid
neuro-wavelet scheme, we apply ARD to choose a short-term
history for higher temporal resolution (i.e., a higher sampling
rate and higher frequencies) while a long-term history is
used for lower temporal resolution. Through this, substantial
information on both the “detailed” and “general” history of the
time-series can be effectively exploited.
ARD stems from a practical Bayesian framework for adaptive data modeling [15], in which the overall aim is to develop
probabilistic models that are well matched to the data, and make
optimal predictions with those models. Given a data set, neuralnetwork learning can be considered as an inference of the most
probable parameters for a model. In most cases, there are a
number of advantages of introducing Bayesian optimization of
model parameters [5]. In particular, they provide a means to explicitly model prior assumptions by constructing the prior distribution over parameters and model architectures. In neural-network learning problems with high-dimensional inputs, generalization performance can often be improved by selecting those
inputs relevant to the distribution of the targets. In the ARD
scheme, we define a prior structure with a separate prior variance hyperparameter associated with each input. These hyperparameters correspond to separate weight decay regularisers for
each input. In other words, ARD is effectively able to infer
which inputs are relevant and then switch the others off by automatically assigning large values to the decay rates for irrelevant inputs, thus preventing those inputs from causing significant overfitting.
The ARD scheme used in this paper approximates the posterior distribution over weights by a Gaussian distribution. Using
this approximation, the “evidence” for a nonlinear model can
be readily calculated by an iterative optimization to find the optimal values for the regularization parameters. The optimization
of these hyperparameters is interleaved with the training of the
neural-network weights. More specifically, the parameters are
;...;w
are wavelet coefficients, c is the residual coefficient series.
divided into classes , with independent scales . For a network having one hidden layer, the weight classes are: one class
for each input, consisting of the weights from that input to the
hidden layer; one class for the biases to the hidden units; and one
class for each output, consisting of its bias and all the weights
from the hidden layer. Assuming a Gaussian prior for each class,
, then the ARD model uses
we can define
the prior of equation
(7)
The evidence framework can be used to optimize all the
regularization constants simultaneously by finding their most
of the evidence,
probable value, i.e., the maximum over
. We expect the regularization constants
for irrelevant inputs to be inferred to be large, preventing those
inputs from causing significant overfitting.
III. HYBRID NEURO-WAVELET SCHEME FOR TIME-SERIES
PREDICTION
Fig. 2 shows our hybrid neuro-wavelet scheme for time-series
, our aim
prediction. Given the time series
, of the series. That
is to predict the th sample ahead,
for single step prediction; for each value of we train
is,
a separate prediction architecture. The hybrid scheme basically
involves three stages, which bear a similarity with the scheme
in [11]. In the first stage, the time series is decomposed into
different scales by autocorrelation shell decomposition. In the
second stage, each scale is predicted by a separate NN and in
the third stage, the next sample of the original time series is
predicted, using the different scale’s prediction, by another NN.
More details are expounded as follows.
For time series prediction, correctly handling the temporal
aspect of data is our primary concern. The time-based à trous
ZHANG et al.: MULTIRESOLUTION FORECASTING FOR FUTURES TRADING
transform as described above provides a simple method. Here
we set up an à trous wavelet transform based on the autocorrelation shell representation. That is, (5) and (6) are applied to
successive values of . As an example, given a financial index
with 100 values, we hope to extrapolate into the future with 1 or
more than 1 subsequent values. By the time-based à trous transto
form, we simply carry out a wavelet transform on values
. The last values of the wavelet coefficients at time-point
are kept because they are the most useful values for
prediction. Repeat the same procedure at time point
and so on. We empirically determine the number of resolution
levels , mainly depending on the inspection of smoothness of
the residual series for a given . Much of the high resolution
coefficients are noisy. Prior to forecasting, we get an overcomplete, transformed data set.
In Fig. 3, we show the behavior of the three wavelet coefficients over a 100-day period for a bond rating series. The original time series and residual are plotted at the top and bottom in
the same figure, respectively. As the wavelet level increases, the
corresponding coefficients become smoother. As we will show
in the next section, the ability of the network to capture dynamical behavior varies with the resolution level.
In the second stage, different predictors are allocated for
different resolution levels and are trained by the following
,
,
. All
wavelet’s coefficients
the networks used to predict the wavelets’ coefficients share the
same structure of a feedforward multilayer perceptron (MLP).
input units, one hidden layer
The network for scale has
sigmoid neurons, and one linear output neuron. Each
with
inputs
neuron in the networks has an adjustable bias. The
to the th network are the previous samples of the wavelets’
coefficients of the th scale. In our implementation, each
network is trained by the backpropagation algorithm using the
scaled conjugate gradient (SCG) method and a weight decay
was used [5].
regularization of the form
The procedure for designing neural-network structure essentially involves selecting the input layer, hidden layer, and output
layer. A basic guideline that should be followed is Occam’s
razor principle, which states a preference for simple models.
The fewer weights in the network, the greater the confidence
that over-training has not resulted in noise being fitted. The selection of input layer mainly depends on the considerations of
which input variables are necessary for forecasting the target.
From the complexity viewpoint it would be desirable to reduce
the number of input nodes to an absolute minimum of essential
nodes. In this regard, we applied ARD to empirically decide the
number of inputs in each resolution level.
The optimum number of neurons in the hidden layer is highly
problem dependent and a matter for experimentation. In all of
our experiments, we set the number of hidden neurons by using
half the sum of inputs plus outputs. Accordingly, for 21 inputs
and one output, 11 hidden units are used. It is worthy to note
that the selection of input and hidden layer neurons also determines the number of weights in the network and an upper limit
on the weight number is dictated by the number of training vectors available. A rough guideline, based on theoretical considerations of the Vapnik–Chervonenkis dimension, recommends
that the number of training vectors should be ten times or more
the number of weights [3].
769
Fig. 3. Illustration of the à trous wavelet decomposition of the closing price
series. From top to bottom: normalized price, w , w , w and residual series.
In the third stage, the predicted results of all the different
,
are appropriately combined. Here
scales
we discuss four methods of combination. In the first method, we
simply apply the linear additive reconstruction property of the
à trous transform, as expressed in (6). The fact that the reconstruction is additive allows the predictions to be combined in an
additive manner. In the following we denote it as method I.
A hybrid strategy can also be empirically applied to determine
what should be combined to provide an overall prediction. In the
second method, the predicted results of all the different scales
are linearly combined by a single-layer perceptron in order to
predict the desired following sample of the original time series.
In order to improve the prediction accuracy, a multilayer perceptron (MLP) with the same structure as for wavelet coefficients
prediction is employed for price series and the corresponding
prediction results are incorporated into the third stage, as shown
in Fig. 2. For brevity, we call it method II. Depending on the
forecasting horizon on the second stage, the number of inputs
to the third-stage network is equal to the number of all the prediction outputs in the first stage. For example, if four resolution
levels are exploited and an MLP for direct price prediction is
incorporated in the second stage, then for forecasting horizon 7,
.
the number of inputs in the third stage perceptron is
In our experiments we have also applied a third stage MLP
in place of the simple perceptron. The number of hidden neurons is also set to half of the sum of the number of inputs and
outputs. We denote this as method III for the combination of
prediction results from the second stage. For comparison purposes, we trained and tested an MLP on the original time series,
denoted as method IV, without the wavelet preprocessing stage.
770
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
As pointed out in [3], target selection is an important issue
in applying neural networks to financial series forecasting. We
follow the guideline suggested by Azoff to minimize the number
of targets required for a given problem. A neural network whose
output neurons are reduced from two to one, will have half
the number of network weights required, with important consequences for the generalization capability of the network. A
single output neuron is the ideal, as the network is focused on
one task and there is no danger of conflicting outputs causing
credit assignment problems in the output layer. Accordingly, we
prefer a forecasting strategy which proceeds separately for each
horizon in the second stage.
Fig. 4. Closing price for ten-year Treasury bonds traded on the Sydney Futures
Exchange.
IV. SIMULATIONS AND PERFORMANCES
Our simulations involved the closing prices of four different
futures contracts: The three-year and ten-year Treasury bonds
(sfe3yb, sfe10yb) traded on the Sydney Futures Exchange, the
Australian US dollar contract (cmedolaus) and the Swiss Franc
US dollar contract (cmesfus) traded on the Chicago Mercantile
Exchange. In order to derive a continuous time series from a set
of individual futures contracts, special care must be taken at the
expiry of a contract. The price change from one contract to the
next cannot be directly exploited in a trading system. Instead
a contract must be rolled from the expiry month to a forward
month. We found that the four securities we are considering
are characterized by a price gap at roll over in the range of the
close to close price variation. The concatenation of spot month
contracts is therefore a reasonable approximation. In Fig. 4, we
show the sfe10yb closing price over a ten-year period.
We study the approach of forecasting each wavelet derived
coefficient series individually and then recombining the marginal forecasts. Our objective is to perform seven days ahead
forecasting of the closing price. As a byproduct, the corresponding price changes are simultaneously derived. To compare
with other similar work in the literature, we also construct five
days ahead forecastes of the relative price change, i.e., the
relative difference percent (RDP) between today’s closing price
, which
and the closing price five days ahead, denoted
[2]. The
is calculated as
data sets used consist of the date, the closing price
and the
target forecast. A separate MLP network for each level of the
coefficients series is constructed. The scaled conjugate gradient
(SCG) algorithm was used for training. As the residual series
are quite smooth, we simply apply linear AR models to them.
At first, the raw price data requires normalizing, a process of
standardising the possible numerical range that the input vector
elements can take. The procedure involves finding the maximum
(max) and minimum (min) elements and then normalizing the
[3]:
price to the range
(8)
Since many of the high resolution coefficients are very noisy,
we applied the ARD technique to determine the relevant inputs
of the MLPs on different levels. At first, each network had 21
inputs. The ARD scheme was used with a separate prior for
each MLP input. Hence, the regularization constants for noisy
inputs are automatically inferred to be large. In Table I we give
typical results of hyper-parameters for input variables when
applying ARD to MLPs on different levels. From the results
we can see that the first two level coefficients are noisy and
have little relevance to the target distribution. To exploit this
fact to further improve performance and reduce computational
complexity, we apply MLPs with variable input sizes to different
levels, as shown in Table II.
We decomposed all the time-series into four resolution levels
as the residual series become quite smooth. All training sets
consist of the first 2000 data values (one closing price per day).
For the sfe3yb and sfe10yb, we use the remaining 600 and 1000
data points for testing, respectively. For the Australian US dollar
contract (cmedolaus) and the Swiss Franc US dollar contract
(cmesfus), we use the remaining 600 data points for testing.
In Fig. 5, we show the one step ahead predictions for each of
and
and
the four coefficient series
the residual series over a 100 days period on a testing set (from
Nov. 15, 1993 to April 13, 1994). As the residual series is very
smooth, a simple AR model shows quite satisfactory prediction
performance. The ability of the networks to capture dynamical
behavior varies with the resolution level [2] and we can observe
is “higher” than
two facts. First, the higher the scale (e.g.,
), the smoother the curve and thus, the less information the
network can retrieve. Second, the lower the scale, the more noisy
and irregular the coefficients are, thus making the prediction
more difficult. The smooth wavelet coefficients at higher scale
play a more important role.
In Figs. 6 and 7, we illustrate the forecasting of one day ahead
price and price change series (RDP), respectively, using the prediction methods I, II, and IV as previously described. The different prediction methods show quite similar results on the same
testing set. But a close inspection reveals a better accuracy resulting from method II, i.e., using a perceptron to combine the
prediction results of wavelet coefficients. To quantitatively calculate the prediction performance, we used mean square error
(MSE) to describe the forecast performance for price predic,
tion, which is defined as MSE
is the true value of the sequence,
is the prewhere
diction. For price change prediction, we used two other measures. The first measure is the normalized mean squared error
, where
is the
NMSE
ZHANG et al.: MULTIRESOLUTION FORECASTING FOR FUTURES TRADING
HYPER-PARAMETERS
FOR THE
771
TABLE I
MLP NETWORK INPUTS ON DIFFERENT LEVELS FOR THE sfe10yb DATA SET. THE ORDER
PARAMETERS ARE FROM PAST TO FUTURE
TABLE II
STRUCTURE OF MLPS ON DIFFERENT LEVELS
A second measure of interest for price change prediction is
the directional symmetry (DS), i.e., the percentage of correctly
predicted directions with respect to the target variable, defined
as
, where is the Heaviside
if
and
othunit-step function, namely,
erwise. Thus, the DS provides a measure of the number of times
the sign of the target was correctly forecast. In other words,
implies that the predicted direction was correct for
half of all predictions.
In Table III we used the sfe10yb data to compare the four
prediction methods with regard to MSE performance for price
prediction and NMSE and DS performance for price change predictions, respectively. From the results we can see that all four
methods have similar performance with regard to the MSE for
price prediction and NMSE and DS for price change prediction
and that method II shows better generalization performance.
The evaluation of the overall system is a very important issue.
By some performance measures, we can evaluate whether targets have been met and compare different strategies in a trading
system. Criteria in setting up a trading strategy will vary according to the degree of risk exposure permitted, so the assessment criteria selected are a matter of choice, depending on priorities.
The most commonly used measure is the Sharpe ratio, which
is a measure of risk-adjusted return [22]. Denoting the trading
system returns for period as , the Sharpe ratio is defined to
be
Average
Standard Deviation
Fig. 5. From top to bottom: one step ahead predictions for the four wavelet
coefficient series w , w , w and w and residual series c, over a 100 days
period on the testing set. In each figure, the dashed line is the target series and
the solid line is the prediction.
is the prediction, and
is the
true value of the sequence,
variance of the true sequence over the prediction period.
OF THE
(9)
where the average and standard deviation are estimated over
.
returns for periods
As another measure of interest we evaluate the quality of our
forecasts in a trading simulator. Trading results are simulated
using the risk evaluation and money management (REMM)
trade simulation environment that has been used in previous
simulations [10], [8]. REMM has been developed and tested
with the help of expert futures traders. It is currently used by a
number of financial institutions to analyze and optimize trading
strategies. A description of the functionality of REMM is given
in the following. REMM facilitates the testing of a trade entry
772
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
Fig. 6. Demonstration of one day ahead forecasting for the closing price of
sfe10yb on a testing data set over a 100 days period (from Nov. 15, 1993 to
April 13, 1994), using the prediction methods (I, II, and IV). In each figure, the
solid line is the target price and the dashed line is the prediction.
Fig. 7. One step ahead price changes (RDP) forecasts (dashed lines) vs the true
RDP series (solid line) for a segment of 100 days in the testing set (from Nov.
15, 1993 to April 13, 1994). See text for explanation for the prediction methods
I, II and IV.
strategy by accurately modeling the market dynamics. As an
input REMM requires the time and price information and a
sequence of trade entry signals. The latter is obtained from the
forecaste output of the various prediction systems. Accurate and
realistic risk and trade management strategies can be selected
to test the quality of the prediction system. This includes the
consideration of transaction costs. They are incurred each time
a futures contract is purchased or sold. Slippage is a common
phenomenon in trading futures. It is the discrepancy between
the theoretical entry or exit price and the actual price. In
REMM slippage is modeled using a volatility based approach.
REMM allows the selection of a number of realistic trade exit
strategies like profit take and partial profit take at various target
levels, trade expiry and stop loss levels. The exit conditions,
e.g., target and stop loss levels, are dynamically adjusted due
to changing market conditions. Risk management strategies
are implemented by providing trading capital of $1 million and
applying risk limits of $10 000 for each trade.
For a given sequence of trade entry signals and a set of risk
and trade management parameters the trading system is simulated using a forward stepping approach. At each time step the
system is updated by checking for new trade entries and adjusting the exit conditions for open positions caused by the new
market price. When an exit condition is satisfied, e.g., due to a
target being reached or a stop loss level hit, etc. the open position is novated and the overall portfolio position is updated.
More than 50 different performance measures are derived that
TABLE III
FOR sfe10yb DATA, PREDICTION PERFORMANCES FROM THE FOUR DIFFERENT
PREDICTION METHODS
allow assessment of the quality of the trading system over the
given training period. The most relevant measures are listed in
the following. The profit per trade is the average profit per trade
over the trading period. The win/loss ratio is the ratio of winning
trades to loosing trades over the trading period. The Sharpe ratio
is the ratio of the annualised monthly return. The worst monthly
loss is the total of losses from trades in the worst calendar month.
An optimal trading strategy is derived from the training set and
applied to the test set.
Using the REMM simulator, we further compared the profitability related performances of the four forecasting methods,
namely, directly summing up the wavelet coefficients predictions from the linear reconstruction property (6) (method I),
using a perceptron (method II) or an MLP (method III) to combine the wavelet coefficients prediction and simply applying an
MLP without wavelet features involved (method IV). For the
ten-year bond contract on the test set (consisting of 1000 days of
ZHANG et al.: MULTIRESOLUTION FORECASTING FOR FUTURES TRADING
773
TABLE IV
FOR sfe10yb DATA, COMPARISON OF THE PROFITABILITY RELATED PERFORMANCES FROM THE FOUR DIFFERENT FORECASTING METHODS
TABLE V
PERFORMANCE COMPARISON FOR DIFFERENT DATA SETS
data), the measures shown in Table IV were calculated to evaluate the performance of the system under realistic trading conditions. Table IV summarizes the profit per trade, the win/loss
ratio, the Sharpe ratio and the worst monthly loss. Each trade is
based on a number of contracts determined by the risk per trade.
From Table IV, it is obvious that method II has the highest
values of both Sharpe ratio (0.7321) and profit/loss ratio
(1.6307), together with a satisfactory trading number and profit
per trade. Though a plain MLP (method IV) generates the most
trades, it yields the worst performance with regard to the profit
per trade, profit-loss ratio and Sharpe ratio. Simply combining
wavelet coefficients using (6) (method I) offers reasonable
results of profit per trade and profit-loss ratio, but leads to
the most conservative trading activity (only 71 trades in more
than three years!). Overall, we can recommend method II as a
practical forecasting strategy for a trading system.
We have also tested the neuro-wavelet prediction method on
the closing prices of other futures contracts: sfe3yb, cmedolaus
and cmesfus. In Table V, we show MSE for price prediction,
NMSE and DS for RDP series prediction, all for testing data
sets. Profit/loss results are given in Figs. 8 and 9 for the
sef3yb data and sef10yb, respectively. Prediction method I
was compared with method IV in Fig. 8 while method II was
compared with method I in Fig. 9. From these evaluations,
we can conclude that multiscale neural-network architectures
generally show better profitability than applying an MLP alone
and the hybrid scheme exploiting a second-stage perceptron
has best performance.
V. DISCUSSION AND CONCLUSION
Forecasting of financial time series is often difficult and
complex due to the interaction of the many variables involved.
In this paper, we introduced the combination of shift invariant
wavelet transform preprocessing and neural-network prediction
models trained using Bayesian techniques at the different levels
of wavelet scale. We compared this with a conventional MLP by
simulation on four sets of futures contract data and determined
both forecasting performance and profitability measures based
Fig. 8. Comparison of profit/loss results from applying the neuro-wavelet
forecasting scheme method I and the MLP alone (method IV), using the
three-year Treasury bonds data sfe3yb. The solid line results from method
I by applying the linear reconstruction property (6) while the dashed line
corresponds to the plain MLP. (a) profit and loss on the training set in \$AUD
against trading days. (b) profit and loss for the testing set.
on an accurate trading system model. Our results show significant advantages for the neuro-wavelet technique. Typically, a
doubling in profit per trade, Sharpe ratio improvement, as well
as significant improvements in the ratio of winning to loosing
trades were achieved compared to the MLP prediction.
Although our results appear promising, additional research is
necessary to further explore the combination of wavelet techniques and neural networks, particularly over different market
conditions. Financial time series, as we have noted, often show
considerable abrupt price changes; the extent of outliers often
decides the success or otherwise for a given model. While the
774
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
(a)
(b)
Fig. 9. Comparison of profit/loss results from applying neuro-wavelet
forecasting scheme method I and method II, using the ten-year bond (sfe10yb)
data. The solid line results from the hybrid architecture using a perceptron for
combining the wavelet coefficients (method II) while the dashed line is the
simpler architecture (method I), in which the wavelet coefficients are directly
summed up. (a) profit and loss on training set in AUD against trading days. (b)
profit and loss for the testing set.
prediction performance is improved, the neuro-wavelet hybrid
scheme is still a global model, which is susceptible to outliers.
Ongoing work includes 1) the integration of the time-based à
trous filters studied here and mixture of local expert model,
which may explicitly account for outliers by special expert networks and 2) direct volatility forecasting by a similar hybrid
architecture. Other research areas include the online adaptation
of the network models including ARD hyper-parameters, the investigation of wavelet based denoising techniques and solutions
to the associated boundary condition problems for the online
learning case in order to further improve generalization performance and the investigation of the joint optimization of forecasting and money management systems.
REFERENCES
[1] A. Aussem and F. Murtagh, “Combining neural networks forecasts on
wavelet-transformed time series,” Connection Sci., vol. 9, pp. 113–121,
1997.
[2] A. Aussem, J. Campbell, and F. Murtagh, “Wavelet-based feature extraction and decomposition strategies for financial forecasting,” J. Comput.
Intell. Finance, pp. 5–12, Mar. 1998.
[3] A. M. Azoff, Neural Network Time Series Forecasting of Financial Markets. New York: Wiley, 1994.
[4] G. Beylkin and N. Satio, “Wavelets, their autocorrelation functions
and multiresolution representation of signals,” IEEE Trans. Signal
Processing, vol. 7, pp. 147–164, 1997.
[5] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford,
U.K.: Oxford Univ. Press, 1995.
, “Bayesian methods for neural networks,”, Tech. Rep.
[6]
NCRG/95/009, 1995.
[7] R. R. Coifman and D. L. Donoho, “Translation-invariant de-noising,”
in Wavelets and Statistics, Springer Lecture Notes, A. Antoniades,
Ed. New York: Springer-Verlag, 1995.
[8] D. R. Dersch, B. G. Flower, and S. J. Pickard, “Exchange rate trading
using a fast retraining procedure for generalized radial basis function
networks,” in Proc. Neural Networks Capital Markets, 1997.
[9] R. J. Van Eyden, The Application of Neural Networks in the Forecasting
of Share Prices. Haymarket, VA: Finance & Technology Publishing,
1995.
[10] B. G. Flower, T. Cripps, M. Jabri, and A. White, “An artificial neural
network based trade forecasting system for capital markets,” in Proce.
Neural Networks Capital Markets, 1995.
[11] A. B. Geva, “ScaleNet—Multiscale neural-network architecture for time
series prediction,” IEEE Trans. Neural Networks, vol. 9, pp. 1471–1482,
1998.
[12] A. Kehagias and V. Petridis, “Predictive modular neural networks for
time series classification,” Neural Networks, vol. 10, pp. 31–49, 1997.
[13] S. G. Mallat, “A theory for multiresolution signal decomposition: The
wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell., vol.
11, pp. 674–693, 1989.
[14] K. R. Müller, J. Kohlmorgen, and K. Plawelzik, “Analysis of switching
dynamics with computing neural networks,” Univ. Tokyo, Tech. Rep.,
1997.
[15] D. J. C. MacKay, “A practical Bayesian framework for backpropagation
networks,” Neural Comput., vol. 4, pp. 448–472, 1992.
, “Bayesian nonlinear modeling for the 1993 energy prediction
[16]
competition,” in Maximum Entropy and Bayesian Methods, Santa
Barbara 1993, G. Heidbreder, Ed. Dordrecht, The Netherlands:
Kluwer, 1995.
[17] T. Masters, Neural, Novel & Hybrid Algorithms for Time Series Prediction. New York: Wiley, 1995.
[18] S. Makridakis, S. C. Wheelwright, and R. J. Hyndman, Forecasting,
Methods and Applications, 3rd ed. New York: Wiley, 1998.
[19] F. F. Murtagh, “Wedding the wavelet transform and multivariate data
analysis,” J. Classification, vol. 15, pp. 161–183, 1998.
[20] V. Petridis and A. Kehagias, “Modular neural networks for MAP classification of time series and the partition algorithm,” IEEE Trans. Neural
Networks, vol. 17, pp. 73–86, 1996.
[21] N. Saito and G. Beylkin, “Multiresolution representations using the
auto-correlation functions of compactly supported wavelets,” IEEE
Trans. Signal Processing, 1992.
[22] W. F. Sharpe, “The Sharpe ratio,” J. Portfolio Management, pp. 49–58,
Fall 1994.
[23] M. J. Shensa, “The discrete wavelet transform: Wedding the á trous
and Mallat algorithms,” IEEE Trans. Signal Processing, vol. 10, pp.
2463–2482, 1992.
[24] M. R. Thomason, “Financial forecasting with wavelet filters and neural
networks,” J. Comput. Intell. Finance, pp. 27–32, Mar. 1997.
[25] A. S. Weigend and M. Mangeas, “Nonlinear gated experts for time series: Discovering regimes and avoiding overfitting,” Int. J. Neural Syst.,
vol. 6, pp. 373–399, 1995.
[26] P. Zuohong and W. Xiaodi, “Wavelet-based density estimator model for
forecasting,” J. Comput. Intell. Finance, pp. 6–13, Jan. 1998.
[27] Z. Gonghui, J.-L. Starck, J. Campbell, and F. Murtagh, “The wavelet
transform for filtering financial data streams,” J. Comput. Intell. Finance, pp. 18–35, June 1999.
[28] J. S. Zirilli, Financial Prediction Using Neural Networks: International
Thomson Computer Press, 1997.
ZHANG et al.: MULTIRESOLUTION FORECASTING FOR FUTURES TRADING
Bai-Ling Zhang was born in China. He received
the Bachelor of Engineering degree in electrical
engineering from Wuhan Institute of Geodesy,
Photogrammetry, and Chartography in 1983, the
Master of Engineering degree in electronic system
from the South China University of Technology in
1987, and the Ph.D. degree in electrical and computer engineering from the University of Newcastle,
Australia in 1999.
From 1998 to 1999, he was a Research Assistant
with School of Computer Science and Engineering,
University of New South Wales, Australia. From 1999 to 2000, he worked as a
Postdoctoral Fellow with the Computer Engineering Laboratory (CEL), School
of Electrical and Information Engineering, University of Sydney. Currently, he
is a member of the research staff in the Kent Ridge Digital Labs (KRDL), Singapore. His research interests include artificial neural networks, image processing
and computer vision, pattern recognition, and time-series analysis and prediction.
775
Marwan Anwar Jabri (S’84–M’85–SM’94) was
born in Beirut, Lebanon, in 1958. He received the
License de Physique and Maitrise de Physique
degrees from the Université de Paris VII, France, in
1981 and 1983, respectively. He received the Ph.D.
degree in electrical engineering at the University of
Sydney in 1988.
He was a Research Assistant during 1984 as part
of the Sydney University Fleurs Radiotelescope research team. He was appointed as Lecturer at the University of Sydney in 1988, Senior Lecturer in 1992,
Reader in 1994, and Professor in 1996. Since January 2000, he has been the
Gordon and Betty Moore endowed Chair Professor at the Electrical and Computer Engineering Department, Oregon Graduate Institute (OGI), Beaverton,
and Professor in Adaptive Systems at the University of Sydney, School of Electrical and Information Engineering. He was a Visiting Scientist at AT&T Bell
Laboratories in 1993 and the Salk Institute for Biological Studies in 1997. He is
author, coauthor, and editor of three books and more than 150 technical papers
and is an invited speaker at many conferences and forums. His research interests
include digital and analog integrated circuits, biomedical and neuromorphic engineering, and multimedia communication systems.
Dr. Jabri is a recipient of the 1992 Australian Telecommunications and Electronics Research Board Outstanding (ATERB) Young Investigator Medal. He is
on the editorial board of several journals. He is member of INNS and a Fellow
of the Institute of Engineering Australia.
Dominik Dersch received the masters degree in physics from the Technical
University of Munich, Munich, Germany, in 1991 and the Doctorate degree in
Natural Science from the Ludwig Maximilians University, Munich, in 1995.
From 1996 to 1997, he was a Research Fellow in the Speech Technology
Group at the University of Sydney. From 1997 to 1999, he worked as a Senior
Forecasting Analyst in Electricity Trading for Integral Energy Australia. From
1999 to 2000, he was Head of Research and Development of Crux Cybernetics
and then of Crux Financial Engineering. He is currently a Senior Quantitative
Analyst at HypoVereinsbank in Munich. His research interests include statistical
physics, statistics, pattern recognition, classification, and time series analysis.
He has worked and published in areas including speech recognition and speech
analysis, remote sensing data analysis, medical image processing and classification, and financial time series analysis and prediction. He holds a license as
financial advisor with the Sydney Futures Exchange.
Richard Coggins (M’95) received the B.Sc. degree
in physics and pure mathematics in 1985 and the B.E.
Hons. degree in electrical engineering in 1987 from
the University of Sydney, Australia. He received the
Ph.D. degree in electrical engineering from the University of Sydney in 1997.
From 1988 to 1990, he worked at Ausonics Pty.
Ltd. in the diagnostic ultrasound products group. In
1990, he received the Graduate Management qualification from the Australian Graduate School of Management. He joined the University of Sydney as a Research Engineer in 1990. He is currently a Senior Lecturer at the School of
Electrical and Information Engineering at the University of Sydney. He was appointed as a Girling Watson Research Fellow in the Computer Engineering Laboratory in 1997. He was appointed as a Senior Lecturer in 2000. His research
interests include machine learning, time series prediction, low-power microelectronics, and biomedical signal processing.
Barry Flower (M’96) received the Bachelor of
Engineering and Bachelor of Computing Science
degrees from the University of New South Wales,
Australia, in 1987 and the Ph.D. degree in electronic
and computer engineering from the University of
Sydney, Australia, in 1995.
From 1990 to 1995, he was a Research Associate
and then Girling Watson Research Fellow in the
System Engineering and Design Automation Laboratory at the University of Sydney. From 1995 to
2000, he was a Founder and Joint Managing Director
of Crux Cybernetics and then Crux Financial Engineering. He is currently
Manager of E-Commerce, Strategic Development at Hong Kong and Shanghai
Banking Corporation. His research interests include connectionists techniques
applied to time series analysis, financial markets, speech recognition, and
autonomous robotic systems.