Academia.eduAcademia.edu

Short-term wind power forecasting using ridgelet neural network

2011, Electric Power Systems Research

Rapid growth of wind power generation in many countries around the world in recent years has highlighted the importance of wind power prediction. However, wind power is a complex signal for modeling and forecasting. Despite the performed research works in the area, more efficient wind power forecast methods are still demanded. In this paper, a new prediction strategy is proposed for this purpose. The forecast engine of the proposed strategy is a ridgelet neural network (RNN) owning ridge functions as the activation functions of its hidden nodes. Moreover, a new differential evolution algorithm with novel crossover operator and selection mechanism is presented to train the RNN. The efficiency of the proposed prediction strategy is shown for forecasting of both wind power output of wind farms and aggregated wind generation of power systems.

Electric Power Systems Research 81 (2011) 2099–2107 Contents lists available at ScienceDirect Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr Short-term wind power forecasting using ridgelet neural network Nima Amjady a , Farshid Keynia a , Hamidreza Zareipour b,∗ a b Electrical Engineering Department, Semnan University, Semnan, Iran Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, Calgary, Alberta, Canada a r t i c l e i n f o Article history: Received 2 June 2011 Received in revised form 3 August 2011 Accepted 4 August 2011 Available online 9 September 2011 Keywords: Wind power forecast Ridgelet neural network Differential evolution algorithm a b s t r a c t Rapid growth of wind power generation in many countries around the world in recent years has highlighted the importance of wind power prediction. However, wind power is a complex signal for modeling and forecasting. Despite the performed research works in the area, more efficient wind power forecast methods are still demanded. In this paper, a new prediction strategy is proposed for this purpose. The forecast engine of the proposed strategy is a ridgelet neural network (RNN) owning ridge functions as the activation functions of its hidden nodes. Moreover, a new differential evolution algorithm with novel crossover operator and selection mechanism is presented to train the RNN. The efficiency of the proposed prediction strategy is shown for forecasting of both wind power output of wind farms and aggregated wind generation of power systems. © 2011 Elsevier B.V. All rights reserved. 1. Introduction In recent years, wind-based power generation has gained significant attention in many countries worldwide because of its environmental benefits. Total installed capacity of wind power in the world reached 159.2 GW at the end of 2009 [1]. In this year, wind generation was 340 TWh, about 2% of worldwide electricity usage. It is estimated that by 2020, about 12% of the world’s electricity will be supplied by wind generation [2]. In fact, wind-based generation is the fastest growing source of renewable energy [3]. However, despite significant environmental benefits, wind power could be highly variable because of the variable nature of earth’s atmosphere [3]. This variability can put at risk power system reliability, which in turn requires more backup conventional generation in the form of reserve and regulation services. The uncertainty of wind power also poses economical risks for wind farm owners, especially in competitive electricity markets. For instance, in the first week under the New Electricity Trading Arrangements (NETA) in the UK, the penalties incurred by inaccurate wind power forecasts resulted in net negative values for wind power generators [4]. A potential solution to these problems is to improve wind generation forecast accuracy. For this reason, different wind power forecast methods have been proposed in recent years. A review of these methods can be found in [5,6]. As some examples, in [7], artificial neural networks in combination with wavelet transform have been presented for short-term wind power prediction in Portugal. Obtained results for a real world case study are illustrated in [7]. Artificial ∗ Corresponding author. Tel.: +1 403 210 9516; fax: +1 403 282 6855. E-mail address: [email protected] (H. Zareipour). 0378-7796/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.epsr.2011.08.007 neural network with adaptive Bayesian learning and Gaussian process approximation has been proposed for very short-term wind power prediction in [8]. It has been shown that the neural predictor outperforms the persistent technique for the test cases considered in [8]. A two step technique based on Bayesian combination algorithm, and three neural network models, namely, adaptive linear element network (ADALINE), backpropagation (BP) network, and radial basis function (RBF) network has been presented in [9] for wind speed forecasting. Despite the performed research works in the area, wind power is a complex process to model and forecast, and more accurate wind power prediction methods are still required. The new contribution of this paper can be summarized as follows: 1) A new prediction strategy for short-term wind power forecasting is proposed based on ridgelet neural network (RNN) with a high function approximation capability. 2) A new differential evolution algorithm with new crossover operator and selection mechanism is proposed to train the RNN. Benefiting from the proposed training algorithm, the RNN can efficiently extract the input/output mapping function of the wind power forecast process. 2. Proposed wind power prediction strategy Structure of the proposed wind power prediction strategy is shown in Fig. 1(a), composed of a feature selection component and a forecast engine. Wind power is a nonlinear mapping function of many candidate inputs including its past values and forecast and past values of the exogenous variables, e.g., wind speed and wind 2100 N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 (a) Candidate Inputs Feature Selection Component Mutual Information (MI) Selected Inputs: z1(t), z2(t), …, zM(t) Forecast Engine Ridgelet Neural Network + New Differential Evolution Algorithm Forecasted Wind Power: WP(t) (b) ⎛ < U 1 , Z (t ) > − b1 ⎞ ⎟ a1 ⎠ ⎝ z1(t) ψ⎜ z2(t) ψ⎜ ⎛ < U 2 , Z (t ) > − b 2 ⎞ ⎟ a2 ⎠ w2 ∑ ⎝ WP(t) . . . . . . zM(t) w1 wN ⎛ < U N , Z (t ) > − b N ⎞ ⎟⎟ aN ⎠ ⎝ ψ ⎜⎜ Fig. 1. Structure of the proposed wind power prediction strategy (a), structure of its forecasting engine (b). direction. The feature selection component, based on the information theoretic criterion of mutual information (MI), refines this large set of candidate inputs. For this purpose, the MI based feature selector evaluates the mutual information between each candidate input and forecast feature (here, wind power of the next time interval) using the data mining techniques. More mutual information between a candidate input and forecast feature leads to higher information value of that candidate for the forecast process. Thus, the feature selector can select a minimum subset of the most informative features and filters out unimportant candidate inputs with low information value for the wind power prediction process. This feature selection technique has been presented for price forecast of electricity markets in [10], wherein it has been shown that this method outperforms the well-known linear feature selection technique of correlation analysis, since the MI based feature selector can also evaluate nonlinear dependencies of the forecast feature on the candidate inputs. Moreover, an efficient approach, based on the entropy concept and binomial distribution, to compute MI between two random variables with low computation burden has been presented in [10]. Due to its ability to process nonlinear dependencies and low computation burden, the MI-based feature selector has been employed as the feature selection component of the wind power prediction strategy in this research work (Fig. 1(a)). However, since this feature selector is not a contribution of this paper, it is not further discussed here. The interested reader can refer to [10] for details of the MI based feature selection component. The selected relevant candidate inputs for the wind power forecast process, denoted by z1 (t), z2 (t), . . ., zM (t) in Fig. 1(a), are applied as inputs to the forecast engine. In the following, we focus on the new part, i.e., the forecast engine, of the proposed wind power prediction strategy. Its structure and training mechanism are introduced in Sections 2.1 and 2.2, respectively. Then, the performance of the proposed wind power prediction strategy is described in Section 2.3. 2.1. Structure of the proposed forecast engine Structure of the RNN applied as the forecast engine is shown in Fig. 1(b). Z(t) is a vector as follows: Z(t) = [z1 (t), z2 (t), . . . , zM (t)] (1) Also the function ·,· in Fig. 1(b) represents inner product of its two vector arguments. The activation functions of the hidden units of the RNN, denoted by (·), are called ridgelets. Ridgelets are a set of basis functions, developed by Candes [11], to represent multivariate functions in an efficient and stable way. To define ridgelets, consider a smooth function : R → R satisfying the following condition:  | ˆ (ω)|2 dω < ∞ |ω|M (2) where (·) is the Fourier transform of (·) and M indicates dimension of the space (here, for wind power forecast process, M is equal to the number of inputs of the RNN). Then, (·) is called an admissible neural activation function. Based on (·), the functions 1 (Z(t)) = √ a  U, Z(t) − b  a (3) are referred to as ridge functions or ridgelets [12–14], used as the activation functions of the RNN. Also, the subscript represents the parameters of the ridgelet as follows: = (a, b, U), a, b ∈ R, a > 0, U ∈ US M , ||U|| = 1 (4) where USM indicates unit sphere in the M dimensional space. In other words, a, b and U represent scale, location and direction of N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 2101 Fig. 2. Wind power with respect to wind speed and direction for Sotavento wind farm in Spain (hourly data from October 24, 2008 to October 24, 2010). the ridgelet, respectively. Finally, the output of the RNN is obtained as follows: WP(t) = N  i=1 wi ·  U , Z(t) − b  i i ai (5) where N indicates the number of the hidden layer nodes as shown in Fig. 1(b). At this stage, the underlying idea behind using the RNN as the wind power forecast engine can be described. Typical three dimensional graph of wind power WP(t) versus wind speed WS(t) and wind direction WD(t) is shown in Fig. 2. Although the number of selected inputs, i.e., M, may be more than two, variation of wind power with respect to more than two inputs cannot be illustrated on a single graph. Since wind speed and wind direction are important drivers for wind power [2], these two inputs are considered for this figure. Spatial inhomogeneity of wind power as a multivariate function can be seen from Fig. 2 such that most of the points are concentrated in a small subset of the space. Moreover, plane singularities of the wind power multivariate function can also be observed from this figure. In general for a hypersurface F(x, y, z, . . .) = 0, the singular points are those at which all the partial derivatives simultaneously vanish. As seen from Fig. 2, the partial derivatives of wind power with respect to its input variables approach zero and the surface becomes nearly flat at the both sides of the surface. This is largely related to the characteristic curve of wind turbines [15]. In the higher dimensional spaces including more inputs (such as lagged values of wind power, wind speed and wind direction in addition to WS(t) and WD(t)), this behavior of wind power leads to hyperplane singularities. Now, observe from (5) that the output of the forecast engine is obtained by a linear combination of the ridgelets, where the weights wi of the RNN (Fig. 1(b)) can be considered as the coefficients of the combination. Ridgelets have proved to be an appropriate basis set for constructing multivariate functions with certain kinds of spatial inhomogeneities [16]. Moreover, in [12], it has been discussed that the RNN can deal with a wide range of functions especially those with hyperplane singularities. Approximating these functions by ridgelets converges more rapidly than traditional Legendre, Fourier and wavelet representations [17]. Thus, the RNN is an appropriate choice for wind power forecast engine considering wind power characteristics. Following several research works on the RNN, such as [13,17,18], we adopt Meyer wavelet as the (·) function to construct the ridge functions of the RNN based on (3). Meyer wavelet is shown in Fig. 3. Its smoothness, sufficient decay, vanishing mean and oscillatory behavior with diverse oscillations can be seen from the figure leading to high capability of Meyer wavelet to be chosen as the (·) function (used for constructing the basis functions). A detailed discussion about mathematical details of Meyer wavelet and its advantages for constructing ridgelets can be found in [17,18]. It is noted that although the ridge functions are built from Meyer wavelet, the proposed RNN is different from wavelet neural network. Wavelet lacks description of direction in high dimension, while direction is an important attribute of data [12]. On the other hand, as seen from (3), ridgelet defines direction in addition to scale and location in its expression. 2.2. Training mechanism of the proposed forecast engine For using the RNN as the forecast engine, it should be first trained. In other words, its free parameters, denoted by vector X, X = [w1 , w2 , . . . , wN , a1 , a2 , . . . , aN , b1 , b2 , . . . , bN , U1 , U2 , . . . , UN ] (6) should be determined. The first three parts of X, i.e., w1 ,w2 ,. . .,wN , a1 ,a2 ,. . .,aN and b1 ,b2 ,. . .,bN contain scalar variables. However, the last part, i.e., U1 ,U2 ,. . .,UN (directions of the ridgelets), includes M dimensional vectors as follows: U1 = [u11 , u12 , . . . , u1M ], UN = [uN1 , uN2 , . . . , uNM ] U2 = [u21 , u22 , . . . , u2M ], ..., (7) Thus, the RNN has N + N + N + M.N = (M + 3).N = NP free parameters. The goal of the training mechanism is optimization of these free parameters such that the training error of the RNN to learn the input/output mapping function of the wind power forecast process is minimized. Fig. 3. Meyer wavelet function. (a) Classical DE crossover. (b) The first operator of the proposed NDE crossover shown in (11). (c) The second operator of the proposed NDE crossover shown in (12). (d) The third operator of the proposed NDE crossover shown in (13). 2102 N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 Training of the RNN by traditional neural network learning algorithms (such as gradient search, Newton and quasi-Newton methods) is not an easy task since computation of derivative of the Meyer wavelet (activation function), required in these learning algorithms, is computationally expensive. Moreover, these learning algorithms usually search the solution space in a specific direction (such as the error gradient direction [19]) and thus for learning the nonlinear input/output mapping function of wind power may be trapped in a local minimum (wherein the error gradient becomes zero). To remedy these problems, a new differential evolution algorithm is proposed in this paper for training the RNN. The proposed differential evolution algorithm (called hereafter new DE or NDE) does not require computation of derivative of the Meyer wavelet. Moreover, the NDE can widely search the solution space in various directions and hence, the training process is more likely to escape out of the local minimum. In the following, at first, the classical differential evolution (DE) algorithm is briefly described. Then, the proposed NDE is introduced. Its application for training of the RNN will be presented in the next subsection. DE is a population based stochastic search technique developed by Storn and Price [20]. As an optimization technique, DE finds increasing interest in recent years due to its search capabilities and adaptability to different solution spaces. Performance of classical DE can be summarized as the following step by step algorithm: 1) Initialization of DE population. Randomly initialize the entire individuals of DE population within the given upper and lower limits. For training the RNN, each individual of the DE population has the structure shown in (6) including NP = (M + 3).N parameters. 2) Compute the value of the objective function, denoted by OF(·), for each individual of DE population. The training error of the RNN is considered as OF(·) of DE that should be minimized. 3) Mutation operation. Consider ith individual of DE population in iteration k denoted by Xik . DE mutation operation generates a trial vector Vik for it as follows: k k k Vik = Xi1 + ˇ · (Xi2 − Xi3 ), i = 1, 2, . . . , NI (8) k , X k and X k are three randomly selected individuals where Xi1 i2 i3 from DE population in iteration k such that i = / i1 = / i2 = / i3; NI indicates number of individuals of DE population; ˇ is the scaling factor of DE, controlling the amplification of the differential variation [20]. Theoretically, ˇ ∈ (0, ∞), but it is usually taken from the range [0.1,1]. 4) Crossover operator. Suppose that Vik = [vki,1 , vki,2 , . . . , vki,NP ] the trial vector produced for the individual Xik = k k , . . . , xk [xi,1 , xi,2 ]. DE crossover operator generates the i,NP k , yk , . . . , yk ] corresponding to the parent offspring Yik = [yi,1 i,2 i,NP Xik as follows: is k = yi,j  vki,j k xi,j if Rand ≤ CR if Rand > CR , 1 ≤ j ≤ NP, 1 ≤ i ≤ NI (9) where CR indicates the crossover rate of DE in the range [0,1] and Rand function generates a random number uniformly disk , a separate random tributed in the interval [0,1]. For each yi,j number is generated by this function. After producing the offspring Yik , its OF(·) value or OF(Yik ) is computed. 5) Selection mechanism. DE has a deterministic selection mechanism to select Xik+1 of the next generation between Xik and its offspring Yik as follows: Xik+1 =  Yik Xik if OF(Yik ) < OF(Xik ) , otherwise 1 ≤ i ≤ NI (10) x i , j ,min x ik, j v ik, j x i , j ,max (a) Classical DE crossover x i , j ,min x ik, j v ik, j x i , j ,max (b) The first operator of the proposed NDE crossover shown in (11) x i , j ,min x ik, j v ik, j x i , j ,max (c) The second operator of the proposed NDE crossover shown in (12) x i , j ,min x ik, j v ik, j x i , j ,max (d) The third operator of the proposed NDE crossover shown in (13) Fig. 4. Comparison of the classical DE crossover and proposed NDE crossover. 6) Stopping condition. If the iteration number k < kmax (maximum number of iterations), increment k and go back to step 3; otherwise return the best individual with the lowest value of OF(·) as the final solution. In the above algorithm, ˇ, CR, NI, and kmax are user defined settings of DE. More details about classical DE algorithm can be found in [20]. To convert classical DE to the proposed NDE, new crossover operator and selection mechanism (for steps 4 and 5 of the above algorithm) are introduced. As seen from (9), the crossover operator of the classical DE produces a single offspring Yik , while the crossover operator of the proposed NDE generates three children, k,1 k,1 k,1 k,2 k,2 k,2 denoted by Yik,1 = [yi,1 , yi,2 , . . . , yi,NP ], Yik,2 = [yi,1 , yi,2 , . . . , yi,NP ] k,3 k,3 k,3 , . . . , yi,NP ], from each parent Xik as follows: and Yik,3 = [yi,1 , yi,2 k,1 k k + Rand × (vki,j − xi,j ), yi,j = xi,j 1 ≤ j ≤ NP, 1 ≤ i ≤ NI (11) k,2 k k yi,j = xi,j + Rand × (xi,j − vki,j ), 1 ≤ j ≤ NP, 1 ≤ i ≤ NI (12) k,3 k ), yi,j = vki,j + Rand × (vki,j − xi,j 1 ≤ j ≤ NP, 1 ≤ i ≤ NI (13) To better illustrate the efficiency of the proposed crossover operator, the performance of the classical DE crossover and the above operators (11)–(13) is compared in Fig. 4(a)–(d). In this figure, xi,j,min and xi,j,max represent minimum and maximum allowable limits of the jth parameter (i.e., xi,j ), respectively. Observe from Fig. 4(a) that the classical DE crossover only searches the two k , vk } of the feasible region of the jth dimension of the points {xi,j i,j solution space for each individual. On the other hand, based on (11)–(13), Fig. 4(b)–(d) shows that the three operators of the prok and vk and outside posed crossover searches the area between xi,j i,j N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 NFF (Y i k ,1 ) NFF (X ik ) 0 NFF (X ik ) NFF (Y i k ,3 ) NFF (Y i k ,2 ) NFF (X ik ) + NFF (Y i k ,1 ) NFF (X ik ) + NFF (Y i k ,1 ) + NFF (Y i k ,2 ) 2103 NFF (V i k ) NFF (X ik ) + NFF (Y i k ,1 ) 1 + NFF (Y i k ,2 ) + NFF (Y i k ,3 ) Fig. 5. Partitioning of the range [0,1]. the area from both the sides. In other words, these three operators can much better cover the feasible region of each dimension of the solution space compared with the classical DE crossover leading to significant enhancement of the search capability of the proposed NDE. Thus, the NDE will have a high chance for finding the global optimum of the optimization problem (here, training of the RNN). The other new aspect of the proposed NDE, compared to the classical DE, is its selection mechanism. Instead of deterministic selection between Xik and Yik shown in (10), the proposed NDE selects Xik+1 among Xik , Yik,1 , Yik,2 , Yik,3 and Vik in a stochastic manner. For this purpose, at first, a normalized fitness function, denoted by NFF(·), is defined for each of these five individuals as follows: Following many previous research works in the area, such as [2,4,5,15,21], and without losing generality, this paper focuses on hourly forecasts. Also, the forecasting horizon of 24 h ahead is adopted in this paper. As described in Section 2.2, the training mechanism of the NDE proceeds in the successive iterations to minimize the training error of the RNN, which is OF(·) of the NDE. However, only minimizing the training error of a neural network (NN) in the training phase may lead to overfitting problem in which the NN begins to memorize the training samples instead of learning them. When overfitting (1/OF(Xik )) NFF(Xik ) = (1/OF(Xik )) + (1/OF(Yik,1 )) + (1/OF(Yik,2 )) + (1/OF(Yik,3 )) + (1/OF(Vik )) NFF(Yik,1 ) = NFF(Yik,2 ) = NFF(Yik,3 ) = NFF(Vik ) = (1/OF(Yik,1 )) (1/OF(Xik )) + (1/OF(Yik,1 )) + (1/OF(Yik,2 )) + (1/OF(Yik,3 )) + (1/OF(Vik )) (1/OF(Yik,2 )) (1/OF(Xik )) + (1/OF(Yik,1 )) + (1/OF(Yik,2 )) + (1/OF(Yik,3 )) + (1/OF(Vik )) (1/OF(Yik,3 )) (1/OF(Xik )) + (1/OF(Yik,1 )) + (1/OF(Yik,2 )) + (1/OF(Yik,3 )) + (1/OF(Vik )) (1/OF(Vik )) (1/OF(Xik )) + (1/OF(Yik,1 )) + (1/OF(Yik,2 )) + (1/OF(Yik,3 )) + (1/OF(Vik )) The training error of the RNN or OF(·) ∈ R+ and thus each of the five normalized fitness functions in (14)–(18) is in the range of [0,1]. Moreover, we have: NFF(Xik ) + NFF(Yik,1 ) + NFF(Yik,2 ) + NFF(Yik,3 ) + NFF(Vik ) = 1 (19) Thus, the range [0,1] can be partitioned into the normalized fitness functions as shown in Fig. 5 and the proposed selection mechanism selects Xik+1 as follows: Xik+1 = 2.3. Performance of the whole proposed prediction strategy ⎧ k Xi if 0 ≤ Rand ≤ NFF(Xik ) ⎪ ⎪ ⎪ k,1 ⎪ if NFF(Xik ) < Rand ≤ NFF(Xik ) + NFF(Yik,1 ) ⎨ Yi Y k,2 if NFF(X k ) + NFF(Y k,1 ) < Rand ≤ NFF(X k ) + NFF(Y k,1 ) + NFF(Y k,2 ) if NFF(Xi ) + NFF(Yi ) + NFF(Yi ) + NFF(Yi (15) (16) (17) (18) occurs in a NN, the training error continues to decrease and it seems that the training process progresses, while in fact the generalization capability of the NN degrades and it loses its prediction ability for unseen forecast samples (generalization is a measure of how well i i i i i i ⎪ ⎪ Yik,3 if NFF(Xik ) + NFF(Yik,1 ) + NFF(Yik,2 ) < Rand ≤ NFF(Xik ) + NFF(Yik,1 ) + NFF(Yik,2 ) + NFF(Yik,3 ) ⎪ ⎪ ⎩ k k,1 k,2 k,3 k Vi (14) (20) ) < Rand ≤ 1 {Xik , Yik,1 , Yik,2 , Yik,3 , Vik } is In other words, the individual among selected that the generated random number by the Rand function falls in the segment of its NFF(·). It is noted that an individual with a lower value of OF(·) has a higher value of NFF(·) based on (14)–(18). In other words, this individual is considered as a fitter individual and occupies a larger portion of the interval [0,1], shown in Fig. 5, leading to its more chance to be selected as Xik+1 . Therefore, the proposed selection mechanism selects the individuals of the next iteration in a stochastic manner giving more chance to fitter individuals for the selection. This selection mechanism is more consistent with the stochastic search nature of a DE algorithm than the deterministic selection method of (10) and thus the population of the NDE better evolves along its iterations producing more effective final solution. the NN performs on the actual problem once training is complete [19]). To remedy this problem, the generalization performance of the NN should also be monitored along its training phase. However, since forecast error is not available in the training phase, validation error is used as an approximation of it. For NN learning, validation samples are a subset of training period that are not used for the adjustment of the free parameters of the NN (such as its weights) and retained unseen for the NN. Thus, error of validation samples or validation error can give an estimate of the NN error for unseen forecast samples (here, wind powers of 24 h ahead). Compared with the training error, validation error can better measure generalization capability of a NN and avoid overfitting problem [19]. However, validation samples should be as similar as possible to forecast samples so that the validation error can give a true estimate 2104 N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 Table 1 RMSE and MMAPE results for prediction of aggregated wind power of Irish power system in September and October 2010. Test month September 2010 October 2010 Average Persistence method Irish EirGrid company Proposed strategy RMSE (MW) MMAPE (%) RMSE (MW) MMAPE (%) RMSE (MW) MMAPE (%) 125.33 164.12 144.73 24.11 25.91 25.01 118.21 160.56 139.38 23.93 23.71 23.82 77.21 73.28 75.24 15.87 14.69 15.28 of the prediction error. Considering short run trend characteristic of wind speed/power signals (dependency on the previous neighboring hours’ values), 24 hourly wind powers related to the day before the forecast day (the closest samples to the forecast samples) are taken into account as the validation samples in this paper. Considering the above explanations, application of the proposed prediction strategy, depicted in Fig. 1(a), for wind power forecast can be summarized as the following execution procedure: 3. Numerical results The proposed prediction strategy is tested for both wind power forecast of Sotavento wind farm in Spain and prediction of aggregated wind power of Irish power system using their real data obtained from [22,23], respectively. Moreover, some wind speed forecast results of the proposed strategy are presented to also illustrate its effectiveness for this forecast process as well. 3.1. Wind power forecast results for the Irish power system 1) Feature selection phase. The inputs of the forecast engine are selected by the MI based feature selection component among the set of candidate inputs. 2) Preparation of training and validation samples. In the numerical experiments of this paper, 50 days prior to each forecast day are considered as its corresponding training period. From which, the first 49 days including 49 × 24 = 1176 hourly samples are taken into account as the training samples and 24 hourly samples of the last day (the day before the forecast day) are considered as the validation samples. These 1176 hourly training samples and 24 hourly validation samples are constructed by the selected inputs based on the historical data. 3) Training phase. The proposed NDE trains the RNN (the forecast engine). Error of the constructed training samples or training error is taken into account as its OF(·). The NDE is implemented according to the step by step algorithm of the classical DE considering that the proposed crossover operator and the proposed selection mechanism should be applied in the steps 4 and 5 instead of (9) and (10), respectively. Moreover, step 6 (stopping condition) of the step by step algorithm is also modified. Instead of setting a maximum number of iterations kmax , the validation error of the RNN (the validation error of the best individual of the NDE owning the lowest validation error) is also monitored along the iterations. Whenever the validation error starts to increase, the generalization performance of the RNN begins to degrade, indicating the occurrence of overfitting problem, and so the training process of the RNN by the NDE should be terminated at this iteration (also known as the early stopping technique). The NDE iteration with the minimum validation error brings the final results of the RNN training, wherein it is expected that the generalization capability of the RNN is maximized. The best individual of the NDE in this iteration determines all free parameters of the RNN shown in (6). Now, the forecast engine is trained and ready for the wind power forecast process. In addition to avoiding from overfitting problem, the proposed stopping condition does not require setting kmax (the maximum number of NDE iterations is automatically determined based on the RNN validation error), which is its another advantage. 4) Prediction phase. Wind power forecast of the RNN is obtained from (5). However, as seen from Fig. 1(b), the proposed forecast engine has only one output. Multi-period forecast, e.g., prediction of hourly wind power for the next 24 h, is reached via recursion, i.e., by feeding input variables with the outputs of the forecast engine. For instance, predicted wind power for the first hour is used as WP(t − 1) for the wind power forecast of the second hour provided that WP(t − 1) is among the selected inputs. Since the first wind farm project was realized in 1992, 110 wind farms with a total capacity of 1379 MW have been installed at the end of June, 2010 in Ireland. More details about this test case can be found in [23]. In the proposed prediction strategy, the set of candidate inputs (Fig. 1) for this test case includes 50 lagged wind power values. The exogenous variables of the weather parameters are not used for the Irish test case, since its wind farms are spread around the country. Thus, the measured weather parameters of a single weather station may not be informative for the prediction of aggregated wind power of the entire system and the weather parameters of all of these wind farms are not available for us. However, wind generation of a power system with many wind farms (such as the Irish power system) is a more smooth signal with less sudden changes compared with the wind power of a single wind farm [24]. This is due to the aggregation of wind farms’ generations leading to higher inertia for the system wind generation. A similar situation can be seen comparing system loads with bus loads [25]. Thus, only considering lagged wind power values is reasonable for wind power forecast of such a power system. The MI based feature selection component refines the set of candidate inputs and selects the most informative features for the forecast engine as described in the previous section. Obtained forecast errors from the proposed strategy for prediction of aggregated wind power of Irish power system in two test months of September 2010 and October 2010 are shown in Table 1 and compared with the forecast errors of the Irish EirGrid company and persistence method. EirGrid company holds licenses as independent electricity Transmission System Operator (TSO) and Market Operator (MO) in the wholesale trading system in Ireland. The wind power forecast results of EirGrid, reported in Table 1, have been taken from their website [23]. Persistence is the benchmark method most frequently used for comparison in the wind power forecast research works such as [2,15,21]. In this method, the forecast for all future time intervals in the forecast horizon is set to the last measured value. Two error criteria are adopted for the numerical experiments of this paper. The first one is root mean square error (RMSE), frequently used as error metric in wind power/speed forecast research works, defined as follows: NH RMSE = 1  (SACT (t) − SFOR(t) )2 NH 1/2 (21) t=1 where SACT(t) and SFOR(t) indicate the actual and forecast values of the signal (wind power or wind speed) for hour t. Here, NH (number of hours) is 720 and 744 for the September and October test N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 2105 Table 2 Comparison of the proposed NDE with SA, GA, PSO and DE for training of the forecast engine (RNN) on the test case of Table 1. Test month SA September 2010 October 2010 Average GA PSO Proposed MMAPE (%) RMSE (MW) MMAPE (%) RMSE (MW) MMAPE (%) RMSE (MW) MMAPE (%) RMSE (MW) MMAPE (%) 127.91 139.64 133.77 24.22 25.61 24.91 118.41 119.71 119.06 21.54 22.14 21.84 111.84 110.48 111.16 19.83 19.76 19.79 102.42 100.24 101.33 17.19 16.09 16.64 77.21 73.28 75.24 15.87 14.69 15.28 months, respectively. It is noted that in the day ahead forecast process, the historical data is updated at the end of each day. Then, the feature selection phase, preparation of training and validation samples and training phase (steps 1–3 of the execution procedure mentioned at the end of the previous section) are performed by the updated data followed by the prediction of hourly wind power values of the next day (prediction phase or step 4 of the execution procedure). However, RMSE in (21) is computed for one month to give a better evaluation of the forecast error over on a longer period. Mean absolute percentage error (MAPE) is one of most commonly used error measures for forecast processes, which is defined as follows: NH MAPE = DE RMSE (MW) 1  |SACT (t) − SFOR(t) | × 100 NH SACT (t) (22) t=1 where SACT(t) , SFOR(t) and NH are as defined for (21). However, MAPE cannot be directly used to measure prediction error of wind power/speed forecast, since SACT(t) may be very small or even zero for these forecast processes. Thus, MAPE could be very large or even reach infinity if the forecast is not zero. To remedy this problem, a modified version of MAPE, denoted by MMAPE, is defined as follows: NH MMAPE = 1  |SACT (t) − SFOR(t) | × 100, NH SAVE-ACT SAVE-ACT = 1  SACT (t) NH t=1 compared in Table 2. GA in [12] and PSO in [14,16] have been used for training of RNN. For the sake of a fair comparison, all training methods of Table 2 have the same set of candidate inputs, the same feature selection technique and the same training, validation and forecast samples. As seen, the proposed NDE results in the lowest RMSE and lowest MMAPE among all methods of Table 2 in both the test months illustrating the effectiveness of the NDE for training of the forecast engine. 3.2. Wind power/speed forecast results for the Sotavento wind farm Sotavento wind farm, located in Galicia, Spain, has a line of 24 wind turbines of 5 different technologies. Its nominal power is 17.56 MW and predicted annual production is 38.5 MWh [22]. The set of candidate inputs for this test case includes 50 lagged wind power, wind speed and wind direction values as well as predicted values for the wind speed and direction (totally, 50 × 3 + 2 = 152 candidate inputs). For this single wind farm, the weather parameters’ data can be obtained from its weather station [22]. Effective inputs for the wind power forecast process are selected among this set by the feature selection technique. In the previous case study, the effectiveness of the proposed strategy for prediction of aggregated wind power of a power system and efficiency of the NDE to train the proposed forecast engine have been shown. For the sake of conciseness and avoiding repetition, NH (23) t=1 In other words, the average value of the signal over the evaluation period (here, one month) is used in the denominator to avoid the problem caused by very small or zero values of wind power/speed. Observe from Table 1 that both the RMSE and MMAPE of the wind power forecast of the proposed strategy, only employing the past values of wind power, are considerably lower than the RMSE and MMAPE of the forecast of the persistence method and EirGrid Company in both the test months. Moreover, the average RMSE and average MMAPE of the proposed strategy, reported in the last row of Table 1, are (144.73 − 75.24)/144.73 = 48.0% and (25.01 − 15.28)/25.01 = 38.9% lower than those of the persistence method and (139.38 − 75.24)/139.38 = 46.0% and (23.82 − 15.28)/23.82 = 35.9% lower than those of the EirGrid Company. To better illustrate wind power prediction accuracy of the proposed strategy, its forecast is compared with the forecast of the EirGrid Company (owning better results than the other benchmark method of Table 1) in Fig. 6(a) and (b). Observe from these figures that the predictions of the proposed strategy are more accurate with lower deviations from the real curves compared with the predictions of the EirGrid company in both the test months. The comparisons of Table 1 and Fig. 6(a) and (b) reveal the wind power forecast capability of the proposed strategy. To evaluate the effectiveness of the proposed NDE for training of the RNN, it is replaced by simulated annealing (SA), genetic algorithm (GA), particle swarm optimization (PSO) and classical DE (which are four well-known stochastic search techniques) in the proposed prediction strategy and their obtained results are Fig. 6. Curves of the real values (black, solid line), forecast of the proposed strategy (blue, dashed line) and forecast of EirGrid Company (red, dash-dot line) for the test months of September 2010 (a) and October 2010 (b). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.) 2106 N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 Table 3 RMSE and MMAPE results for wind power forecast of Sotavento wind farm in April–July 2010. Test month April 2010 May 2010 June 2010 July 2010 Average Persistence method Multivariate ARIMA RBF RMSE (MW) MMAPE (%) RMSE (MW) MMAPE (%) RMSE (MW) MMAPE (%) MLP RMSE (MW) MMAPE (%) Proposed strategy RMSE (MW) MMAPE (%) 1.124 0.848 0.784 0.826 0.895 35.91 30.84 34.33 36.84 34.48 0.843 0.742 0.702 0.691 0.744 28.74 27.21 28.91 29.54 28.60 0.594 0.516 0.593 0.501 0.551 25.08 18.11 26.45 27.75 24.35 0.514 0.618 0.521 0.467 0.530 22.44 19.82 25.24 26.14 23.41 0.463 0.435 0.437 0.376 0.428 7.75 11.43 16.06 9.33 11.14 Table 4 RMSE and MMAPE results for wind speed forecast of Sotavento wind farm in April–July 2010. Test month April 2010 May 2010 June 2010 July 2010 Average Persistence method Multivariate ARIMA RBF RMSE (m/s) MMAPE (%) RMSE (m/s) MMAPE (%) RMSE (m/s) MMAPE (%) MLP RMSE (m/s) MMAPE (%) Proposed strategy RMSE (m/s) MMAPE (%) 9.11 9.21 9.34 8.56 9.05 99.23 98.15 97.69 98.77 98.46 7.92 7.52 8.02 6.34 7.45 90.15 84.34 85.59 74.25 83.58 6.12 6.92 6.48 6.31 6.46 69.34 76.91 70.29 72.34 72.22 6.33 5.38 6.51 5.93 6.04 70.14 58.12 73.39 66.91 67.14 2.13 1.74 1.50 1.51 1.72 23.08 22.23 21.63 20.67 21.90 Table 5 RMSE and MMAPE results of the proposed strategy for wind power forecast of Sotavento wind farm in April–July 2010 with different training periods. Test month April 2010 May 2010 June 2010 July 2010 Average 30 days 40 days 50 days 60 days 70 days RMSE (m/s) MMAPE (%) RMSE (m/s) MMAPE (%) RMSE (m/s) MMAPE (%) RMSE (m/s) MMAPE (%) RMSE (m/s) MMAPE (%) 1.051 1.004 0.841 0.862 0.939 22.43 24.49 20.42 17.91 21.31 0.581 0.554 0.573 0.416 0.531 13.08 13.91 17.41 12.27 14.17 0.463 0.435 0.437 0.376 0.428 7.75 11.43 16.06 9.33 11.14 0.562 0.532 0.501 0.420 0.504 12.82 12.72 17.02 13.44 14.00 0.752 0.703 0.686 0.662 0.701 18.23 16.25 20.07 16.02 17.64 the other aspects of the proposed strategy, including its ability for prediction of wind power and speed of a wind farm, are evaluated in this case study. Moreover, the test months are also changed to illustrate the effectiveness of the proposed strategy for different months and seasons. Wind power and speed forecast errors of the proposed strategy for Sotavento wind farm in the four test months of April 2010, May 2010, June 2010 and July 2010 are shown in Tables 3 and 4, respectively, and compared with the results of some other well-known forecast methods. The benchmark methods of Tables 3 and 4 include persistence method, multivariate ARIMA time series, radial basis function (RBF) neural network and multi-layer perceptron (MLP) neural network trained by the efficient Levenberg–Marquardt (LM) learning algorithm. These forecast methods have been used in many other wind power forecast research works such as [2,4–6,15,21]. All forecast methods of Tables 3 and 4 have the same set of candidate inputs, feature selection technique and training period (except the persistence method that does not require feature selection and training process), since the purpose of these numerical experiments is comparison of the efficiency of different forecast engines. RMSE and MMAPE in these tables are as defined in (21) and (23), respectively. Observe that the proposed prediction strategy outperforms all other methods of Tables 3 and 4 for both wind power and wind speed forecasts. RMSE and MMAPE of the proposed strategy are significantly lower than RMSE and MMAPE of all other methods of Tables 3 and 4 in the four test months indicating efficiency of the proposed strategy for both wind power and wind speed forecasts. A short training period including a small number of training samples usually causes that a neural network based forecast engine cannot correctly learn the input/output mapping function of the forecast process. On the other hand, a long training period includes far historical data. Considering time variant behavior of wind power mapping function, far training samples usually are not informative and may even be misleading for a wind power forecast engine. Thus, to select the training period, we started with a short period and then gradually increased it. The execution procedure was implemented with each training period and the obtained wind power forecast results were recorded. Sample results of this analysis with training periods of 30 days, 40 days, 50 days, 60 days and 70 days prior to the forecast day are shown in Table 5. It can be observed from this table that 50 days training period results in the lowest forecast errors in terms of both RMSE and MMAPE. Thus, this training period has been selected leading to 24 validation samples (hourly samples of the day before the forecast day) and (50 − 1) × 24 = 49 × 24 = 1176 training samples. In the performed numerical experiments, the user defined settings of the proposed strategy (defined in Section 2.2) are selected based on a few trial runs. After that, the whole computation time of the proposed prediction strategy, including steps 1–4 of the execution procedure, is about 1 min for the test cases of this paper. This setup time, measured on a simple hardware set of Pentium P4 3.6 GHz with 4 GB RAM, is completely acceptable within a dayahead decision making framework. Even, it is acceptable for shorter forecast intervals such as hour ahead or 10-min ahead predictions. 4. Conclusion Wind power is a nonlinear multivariate function owning spatial inhomogeneity and hyperplane singularities. Ridgelets constitute an efficient basis set for constructing such functions. Considering this matter, a RNN, having ridge functions as the activation functions of the hidden layer nodes, is presented for wind power forecast in this paper. For training of the suggested forecast engine (determination of its free parameters) a new stochastic search technique, named NDE, is proposed. The proposed NDE has a low computation burden and only requires a small set of training and validation samples. Moreover, it can widely search the solution space in various directions increasing the chance of finding the global optimum of the training problem. Efficiency of N. Amjady et al. / Electric Power Systems Research 81 (2011) 2099–2107 the proposed prediction strategy for wind power forecast of both single wind farms and power systems and wind speed forecast is extensively illustrated. References [1] A report prepared by World Wind Energy Association, World Wind Energy Report 2009, March 2010, http://www.wwindea.org/home/images/ stories/worldwindenergyreport2009 s.pdf. [2] S. Fan, J.R. Liao, R. Yokoyama, L. Chen, W.-J. Lee, Forecasting the wind generation using a two-stage network based on meteorological information, IEEE Trans. Energy Convers. 24 (2) (2009) 474–482. [3] J.W. Taylor, P.E. McSharry, R. Buizza, Wind power density forecasting using ensemble predictions and time series models, IEEE Trans. Energy Convers. 24 (3) (2009) 775–782. [4] G. Sideratos, N. Hatziargyriou, Using radial basis neural networks to estimate wind power production, IEEE Power Engineering Society General Meeting (June 2007) doi:10.1109/PES.2007.385812. [5] A. Costa, A. Crespo, J. Navarro, G. Lizcano, H. Madsen, E. Feitosa, A review on the young history of the wind power short-term prediction, Renew. Sustain. Energy Rev. 12 (6) (2008) 1725–1744. [6] M. Lei, L. Shiyan, J. Chuanwen, L. Hongling, Z. Yan, A review on the forecasting of wind speed and generated power, Renew. Sustain. Energy Rev. 13 (4) (2009) 915–920. [7] J.P.S. Catalao, H.M.I. Pousinho, V.M.F. Mendes, Short-term wind power forecasting in Portugal by neural networks and wavelet transform, Renew. Energy 36 (April (4)) (2011) 1245–1251. [8] R. Blonbou, Very short-term wind power forecasting with neural networks and adaptive Bayesian learning, Renew. Energy 36 (March (3)) (2011) 1118–1124. [9] G. Li, J. Shi, J.Y. Zhou, Bayesian adaptive combination of short-term wind speed forecasts from neural network models, Renew. Energy 36 (January (1)) (2011) 352–359. [10] N. Amjady, F. Keynia, Day-ahead price forecasting of electricity markets by mutual information technique and cascaded neuro-evolutionary algorithm, IEEE Trans. Power Syst. 24 (1) (2009) 306–318. 2107 [11] E.J. Candes, Ridgelet Theory and Applications, PhD Thesis, Department of Statistics, Stanford University, 1998. [12] S. Yang, M. Wang, L. Jiao, A linear ridgelet network, Neurocomputing 73 (2009) 468–477. [13] S. Yang, M. Wang, L. Jiao, Approximation of functions with spatial inhomogeneity based on true ortho-ridgelet neural network, Appl. Soft Comput. 11 (2) (2011) 2444–2451. [14] R. Su, L. Kong, S. Song, P. Zhang, K. Zhou, J. Cheng, A new ridgelet neural network training algorithm based on improved particle swarm optimization, Third International Conference on Natural Computation (ICNC 2007) Haikou, Hainan, China, August 24–27, 2007, ISBN: 0-7695-2875-9. [15] I.G. Damousis, M.C. Alexiadis, J.B. Theocharis, P.S. Dokopoulos, A fuzzy model for wind speed prediction and power generation in wind parks using spatial correlation, IEEE Trans. Energy Convers. 19 (2) (2004) 352–361. [16] S. Yang, M. Wang, L. Jiao, Ridgelet kernel regression, Neurocomputing 70 (2007) 3046–3055. [17] Ridgelets: a key to higher-dimensional intermittency, http://www-stat. stanford.edu/∼donoho/Reports/1999/RoySoc.pdf. [18] Digital Ridgelet Transform based on True Ridge Functions, http://www.famaf. unc.edu.ar/∼flesia/PDF-files/DonohEetAl ridgelets.pdf. [19] D.R. Hush, B.G. Horne, Progress in supervised neural networks, IEEE Signal Process. Mag. 10 (1) (1993) 8–39. [20] R. Storn, K. Price, Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim. 11 (4) (1997) 341–359. [21] R.G. Kavasseri, K. Seetharaman, Day-ahead wind speed forecasting using fARIMA models, Renew. Energy 34 (5) (2009) 1388–1393. [22] Sotavento wind farm data http://www.sotaventogalicia.com/tiempo real/ english/instantaneos.php. wind power data http://www.eirgrid.com/operations/ [23] Ireland systemperformancedata. [24] M. Milligan, et al., Wind power myths debunked, IEEE Power Energy Mag. 7 (6) (2009) 89–99. [25] N. Amjady, Short-term bus load forecasting of power systems by a new hybrid method, IEEE Trans. Power Syst. 22 (1) (2007) 333–341.