IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 3, MAY 2016
1683
Provision of Regulation Service by Smart Buildings
Enes Bilgin, Member, IEEE, Michael C. Caramanis, Senior Member, IEEE,
Ioannis Ch. Paschalidis, Fellow, IEEE, and Christos G. Cassandras, Fellow, IEEE
Abstract—Regulation service (RS) reserves, a critical type of
bi-directional capacity reserves, are provided today by expensive
and environmentally unfriendly centralized fossil fuel generators.
This paper investigates provision of RS reserves by the demand
side. We consider a smart building operator that is capable of
modulating the aggregate consumption of the building loads via
price signals in response to an unanticipated RS signal that an
independent system operator broadcasts. We first model the RS
signal and load behavior, and formulate the related stochastic
dynamic programming (DP) problem. Then, in order to deal with
the complexity of the DP problem resulting from the uncountably
infinite allowable price set, we characterize certain key properties of the DP dynamics, solve the DP problem for a discretized
price policy to observe the structure of the optimal policy and
re-capture the continuous price policy in an analytic approximate policy iteration (API) algorithm using the above properties
and structure. We finally provide numerical evidence that the
novel API algorithm converges to a continuous dynamic price
policy that outperforms optimal discretized price policies in both
computational effort and average cost.
Index Terms—Distributed demand management, regulation
service provision, real-time stochastic control, hour-ahead and
real-time market response.
I. I NTRODUCTION
HE INTEGRATION of renewable energy into the power
grid is progressing at an increasing rate ([1]). The
intermittency and volatility of renewable generation, however, results in a commensurate increase in the reserves that
Independent System Operators (ISOs) must secure ([2], [3]).
In [4], a rough rule of thumb suggests an additional 1 MW
RS reserve requirement per each 100 MW increase in the
nameplate capacity of wind power in a system. Among various types of reserves, Regulation Service (RS) reserve is
considered to be the most valuable one since an RS reserve
provider is required to respond to an RS signal, which is
T
Manuscript received November 16, 2014; revised May 3, 2015 and
October 2, 2015; accepted November 6, 2015. Date of publication
December 2, 2015; date of current version April 19, 2016. This work
was supported by the National Science Foundation through the Division of
Electrical, Communications, and Cyber Systems under Grant EFRI-1038230.
Paper no. TSG-01132-2014.
E. Bilgin is with IT Global Operations, Advanced Micro Devices, Inc.,
Austin, TX 78735 USA (e-mail:
[email protected]).
M. C. Caramanis is with the Department of Mechanical Engineering and the
Division of Systems Engineering, Boston University, Brookline, MA 02446
USA (e-mail:
[email protected]).
I. C. Paschalidis and C. G. Cassandras are with the Department of Electrical
and Computer Engineering, and the Division of Systems Engineering,
Boston University, Boston, MA 02215 USA (e-mail:
[email protected];
[email protected]).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSG.2015.2501428
updated every few seconds, by adjusting its output up and
down. Not surprisingly, RS is also the most expensive reserve
type on the market and its cost is comparable to or more
expensive than the cost of energy. As an example, in the first
quarter of 2014, the average price of electricity (Locational
Marginal Price) was around $40 − 55/MWh in the region
managed by the Independent System Operator of the New
England (ISONE). During the same time period, the average
final hourly RS clearing price was $44.4/MWh. In addition, 17% of the time, the RS price was above $70/MWh,
with a maximum of $1407/MWh on January 28, 2014 at
8 a.m ([5]).
Today, RS reserves are provided by expensive and environmentally unfriendly centralized fossil fuel generators.
This study investigates provision of RS reserves by the
demand side. In particular, we consider a Smart Building
Operator (SBO) that is capable of influencing the aggregate consumption of a population of flexible loads, through
a dynamic control mechanism, in response to the RS signal
that the ISO broadcasts to preserve the energy balance in the
power system. The SBO of interest, we assume, anticipates an
average aggregate consumption rate of A kW for the hour of
operation and it purchases At kWh of energy in the forward
power market (hour ahead market), where t = 1h. In addition, the SBO sells R kW of RS reserve in the hour ahead
market for the same one-hour time horizon. As we denote the
hour ahead energy and RS market clearing prices by E and
R , respectively, the SBO is charged tAE − RR . Notice
that the SBO is credited by RR for providing RS reserve.
However, as an RS reserve provider, the SBO promises to
modulate its consumption rate, P(t), in the range [A−R, A+R]
during the hour of operation in order to track the RS signal,
y(t) ∈ [−1, 1], which is updated at t = 4-second time intervals as in the current practice at the ISONE. Perfect tracking
requires that P(t) reaches A + Ry(t) by the next signal update,
at t + t. Otherwise, the SBO is charged a penalty proportional to the tracking error |P(t+t)−(A+Ry(t))|. The SBO’s
optimal dynamic control policy must balance (i) the tracking
error penalty minus the utility realized by flexible appliance
users during the period t, against (ii) future costs affected by
P(t + t). This study focuses on developing such an optimal
dynamic pricing policy for the use of the SBO to modulate
its aggregate consumption in response to the RS signal for
given A and R values. It is important to emphasize that the
RS signal is determined by the ISO as a result of the energy
imbalance in a much larger area compared to the size of the
smart building of interest. Therefore, the RS signal is beyond
the control of the SBO; and the effect of the SBO actions to
1949-3053 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1684
the overall energy balance is insignificant. Also note that the
discussion about how to chose A and R in the hour ahead market is beyond the scope of this paper, for which we refer the
reader to [6].
Obtaining a dynamic control policy for the SBO requires (i)
modeling the dynamics of the RS signal, which is essentially
an exogenous random variable, and (ii) modeling the energy
consumption behavior of the flexible loads together with
their interaction with the price control. This paper provides
a Markov chain representation of the RS signal dynamics.
In order to model the latter, we consider a generic flexible
appliance whose operation and requirements are similar to
autonomous heating or cooling appliances, space air renewal
ventilators, refrigerators, electric water heaters, and in general
loads that demand a fixed amount of energy over a certain
time period. Rather than controlling the appliance consumption directly, the SBO broadcasts a price signal. Then, each
appliance makes the decision -in an automated manner- of
whether to consume energy or not by comparing the price
with its energy need at discrete time intervals. This type of
control approach is known as “indirect control” in power
systems literature. In [7], the author proposes a hierarchical
control strategy for the power system and introduces the idea
of price-based demand response. In [8], the use of real-time
prices to assist in the control of frequency and tie-line deviations in electric power systems is discussed. In [9], the authors
develop a state-queuing model to analyze the price response
of aggregated loads consisting of thermostatically controlled
appliances (TCAs). Finally, in [10], a model predictive control
strategy is proposed for tracking an RS signal by groups of
flexible loads such as plug-in hybrid electric vehicles (PHEVs)
and TCAs.
This paper is an extension and generalization of past
work [11] and [12]. In particular, the model investigated here
differs relative to [11] in the following ways: (i) RS signals
are uncontrollable by the SBO as they depend on a proportional integral filter of the Area Control Error and System
Frequency excursions, (ii) a Two-Dimensional Markov chain
model whose parameters are estimated using actual historical RS signal dynamics is used to represent the statistical
behavior of y(t) available to the SBO, and (iii) a dynamic
control strategy is obtained in place of a static asymptotic
control. Furthermore, we derive properties of the average cost
resulting from the optimally controlled RS provision and show
that these properties can assist the SBO in bidding optimally for Energy and RS in the Hour Ahead Market. We
also extend the results reported in [12] as they pertain to
the utility of smart building appliance users and provide a
novel Approximate Policy Iteration (API) solution algorithm
and extensive numerical experience. The rest of the paper is
organized as follows: In Section II, we present a nomenclature that summarizes the definitions in the paper. In Section III,
we model the RS signal stochastic dynamics, define the flexible load response to SBO control actions, and formulate
the resulting Stochastic Dynamic Programming problem. In
Section IV, we compare dynamic and static control policies,
and show how the average appliance user utility decreases
under dynamic control policies. In Section V, we develop a
IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 3, MAY 2016
Linear Programming based solution methodology for a discretized allowable control space approximation, make observations on the structure of the associated optimal policy, and
use it to develop an analytic approximate policy iteration (API)
approach that can be calibrated to relax the control space discretization approximation and provide near optimal control.
We present numerical results in Section VI and conclude in
Section VII.
II. N OMENCLATURE
Energy clearing price ($/kWh).
RS reserve clearing price ($/kW).
Anticipated average energy consumption rate for
the SBO given E (kW).
R
Amount of RS reserve the SBO sells in the hour
ahead power market (kW).
P(t)
SBO energy consumption rate at time t (kW).
T
Time horizon of the DP problem (min).
y(t)
Value of the RS signal at time t.
y
Change in the RS signal between two updates.
ȳ
Maximum value allowed for |y|.
d(t)
RS signal direction at time t, d(t) ∈ {−1, +1}.
N
Number of appliances subject to SBO control.
n(t)
Number of active appliances at time t.
Lower bound imposed on n(t).
nm
nM
Upper bound imposed on n(t).
n
Change in the number of active appliances
between sequential time steps.
r
Energy consumption rate per appliance (kW).
u(t)
Price signal the SBO broadcasts at time t ($).
Rate at which an appliance monitors the price
λa
control u(t) (1/min).
Maximum aggregate connection rate (1/min).
λM
λ(t)
Aggregate connection rate at time t (1/min).
λ̄
Average aggregate connection rate (1/min).
µ
Active appliance disconnection rate (1/min).
UM
Maximum value that u(t) and φi (t) can take($).
u
Discretization increment for u ∈ [0, UM ] ($).
ū
Price average over [0, T] ($).
Variance of the price u(t) over [0, T].
σu2
φi (t)
Utility of appliance i at time t ($).
(φ(t)) Probability density function of φ(t).
[t1 ,t2 ] Utility realized over [t1 , t2 ] ($).
C[t1 ,t2 ] Tracking cost incurred over [t1 , t2 ] ($).
κ
Tracking cost coefficient ($/kW 2 s).
Temperature in zone i at time t (°F).
Ti (t)
Desired minimum temperature for zone i (°F).
Tmin,i
Tmax,i
Desired maximum temperature for zone i (°F).
h(x)
Differential cost of the system state x ($).
J
Average cost of the DP problem ($).
Linear Programming dual variable associated with
pθ (x)
state x with parameter vector θ .
ith element of the parameter vector θ .
θi
θi (k)
Value of θi at the kth iteration of the API.
uθ (x)
Price policy under θ for state x.
θ
Upper bound imposed on |θi (k + 1) − θi , (k)|.
ρ
API algorithm parameter to scale θ .
E
R
A
BILGIN et al.: PROVISION OF RS BY SMART BUILDINGS
1685
Fig. 2. Markov Chain Representation of RS Signal Dynamics, y ∈ {−1, −1+
ȳ, . . . , 1 − ȳ, 1}, d ∈ {−1, 1}.
Fig. 1.
Sample RS Signal Data from PJM.
III. P ROBLEM F ORMULATION
A. RS Signal Dynamics and Tracking Costs
ISOs manage the RS reserves, which are procured from
multiple suppliers in the Hour Ahead Market, in real-time by
broadcasting a single RS signal y(t) ∈ [−1, 1] that is updated
in t second intervals. Although t may vary among different ISOs, we follow the ISONE practice by setting t = 4
seconds. The RS signal is non-deterministic as ISOs calculate
the signal value through a proportional-integral filter of system
wide frequency deviation.
Assumption 1: The RS
signal the ISO broadcasts is Energy
Neutral over an hour, i.e., K−1
k=0 y(kt) = 0, where K = 900,
i.e., Kt = 1h for t = 4 sec.
Assumption 1 implies that if the SBO tracks the regulation
signal without error, its total consumption over the hour of
operation will be equal to At kWh, t = 1h, the quantity it
purchased in the hour ahead market.
Figure 1 displays a sample of actual PJM RS Signal data
over an hour. The figure shows that RS signal dynamics exhibit
monotonically increasing or decreasing sub-trajectories. We
represent these dynamics by a Markov chain model whose
state space is two dimensional, where the first dimension
is the value of the RS signal, y(t), and the second dimension is the “trend” or the “direction” of the RS signal, d(t).
This two-dimensional Markov chain (TDMC) is shown in
Figure 2. The RS signal, y(t), may take any value in [−1, 1]
subject to a maximal ramp rate ȳ between two updates.
However, it is adequate to represent its probabilistic change
by y = y(t + t) − y(t), where y is a discrete random variable taking values in the set {−ȳ, 0, ȳ}. The probability
mass function of y is calibrated appropriately conditional
upon y(t) and d(t). The dynamics of d(t) are completely
specified by the dynamics of y(t) as follows:
S{y}, if y = 0
d(t + t) =
(1)
d(t),
if y = 0,
where S is the sign function. Note that d(t) ∈ {−1, +1}, so
that there are only two possible states for the direction, namely
“down” and “up.” Based on the value y takes, the direction either stays the same or it is reversed. When d(t) = +1,
and y(t) is small, then Pr[y ≥ 0|d(t) = +1] ≥ Pr[y =
−ȳ|d(t) = +1], and vice versa for d(t) = −1. In short, the
transition probabilities of y(t) depend on the value of d(t) as
well as on the magnitude of y(t) that takes values in the discrete state space, i.e., y(t) ∈ {−1, −1 + ȳ, . . . , 0, . . . , 1 −
ȳ, 1}. As y(t) approaches 1, its upward transition probability decreases to 0. Similarly downward transition probabilities
decrease to 0 as y(t) approaches −1. In order to calibrate
the probability matrix of the TDMC, one needs to derive the
state frequencies of the RS signal using historical ISO data,
and count the number of upward and downward transitions in
each state.
B. Consumption Dynamics and Utility
Assume that the Smart Building (SB) of interest has N
flexible appliances (e.g., heating or cooling zones, electric
vehicles (EV) etc.) whose aggregate consumption is to be modulated in response to the RS signal. n(t) of N appliances are
“active”, i.e., they consume electricity at time t each at a rate
of r kW. At the same time, N −n(t) of N are “idle” and they do
not consume electricity at time t. We define the transition of an
appliance from idle to active as a “connection” and the transition from active to idle as a “disconnection”. We assume that
each appliance has a smart controller that is designed to ensure
that it remains active for an exponentially distributed time with
parameter µ, so that the average connection time is 1/µ. In
addition, to avoid immediate switching back to an active state,
the controller is designed to reconsider connecting after an
exponentially distributed time with parameter λa . Probabilistic
monitoring and disconnections of the appliances are on purpose and to avoid undesirable chattering, synchronization, and
exceedingly short active status duration ([9]).
The SBO can influence the rate at which idle appliances
connect by broadcasting a “price” u(t). When an idle appliance i considers connecting at time t, it proceeds to connect
if and only if its “utility” exceeds u(t). The utility level of
appliance i at time t, φi (t), represents the occupant’s “value for
connecting”, i.e., the value measured in dollars for connecting
and consuming at the rate r for an exponentially distributed
time period with average 1/µ. It is reasonable to consider
that φi (t) will be a function of the appliance i heating zone
temperature at time t, Ti (t), a point in the occupant’s comfort
interval [Tmin,i , Tmax,i ]. When Ti (t) is closer to Tmin,i during
the cooling season, φi (t) should be low, and when it is closer to
Tmax φi (t) should be high. Here, we assume that the appliance
utility is always nonnegative. Moreover UM > 0 denotes the
maximum utility for appliance i for the case Ti (t) ≥ Tmax,i ;
therefore we note φi (t) ∈ [0, UM ]. As a consequence, it is
appropriate to limit the values that u(t) can take to the same
1686
interval, i.e., u(t) ∈ [0, UM ]. Note that UM indicates the price
level beyond which no connections occur; and this is also how
the SBO could obtain an estimate of UM from the historical
consumption data.
We assume that the SBO updates u(t) just after the ISO
updates y(t), namely in t second intervals. Since N − n(t)
gives the number of idle appliances at time t, the total
rate at which the idle appliance population monitors the
price at time t is given by (N − n(t))λa . Consider now the
utilities of the idle appliances monitoring the SBO price,
{φ1 (t), φ2 (t), . . . , φi (t), . . . , φN−n(t) (t)}. Given that each idle
appliance is equally likely to monitor the SBO price at time t
due to the identical, independent and exponentially distributed
inter-monitoring times, these utilities are statistically equivalent to a random sample of size Z selected from N − n(t) idle
appliances at time t. We can construct the frequency histogram
of utility samples grouped in small intervals u spanning the
segment [0, UM ]. For large N − n(t) the histogram normalized by Z converges to a p.d.f. (φ(t)) as u approaches 0.
(φ(t)) characterizes a random process φ(t) representing the
utility of a randomly selected idle appliance at time t. The rate
of idle appliances connecting when the SBO broadcasts price
u(t) can be now expressed as λa (N − n(t))Pr{φ(t) ≥ u(t)},
where Pr{φ(t) ≥ u(t)} is given by (φ(t)). Use of (φ(t))
simplifies immensely the description of the dynamics of n(t)
which would otherwise require that the state includes all of
the individual idle appliance utilities φi (t), ∀ i. Given the
relationship of φi (t) and individual idle appliance space conditioning zone temperature Ti (t), φ(t) varies over time in a
manner that is similar to that of the p.d.f. of Ti (t) over all
appliances i. We describe below conditions under which φ(t)
can be approximated reasonably by a time invariant p.d.f. that
is constant over the random variable’s range, hence uniform,
over [0, UM ].
Assumption 2: n(t) takes values in a bounded range such
that nm ≤ n(t) ≤ nM , where nm and nM are positive constants,
nM − nm = 2R/r and nM − nm ≪ N − nM .
When n(t) takes values in, nm ≤ n(t) ≤ nM , where nM −nm ≪
N − nM and (N − nM )/(N − nM ) ≈ 1, it is reasonable
to approximate (N − n(t))λa by a time invariant constant
λM = (N − (nm + nM )/2)λa , and (φ(t)) by a time invariant uniform distribution over [0, UM ]. The assumption above
holds under weather and time invariant building heat loss conditions that result in duty cycle appliances being active a small
fraction of the time rendering n(t) ≪ N, and, in addition,
hour ahead market bids of RS levels that are relatively small,
resulting in 2R/r = nM − nm ≪ N − nM . We have verified the
veracity of the assumption with extensive simulation, where
we have observed that the aggregate consumption stay in the
A ± 3R/2 range almost all the time. Although the time invariant uniformity assumption is a sacrifice in the model accuracy
for the sake of simplicity, it is adopted both in communication
networks and power systems literature together when the user
pool is considered to be sufficiently big (e.g., [11] and [13]).
We summarize the consequence of the discussion as follows.
Assumption 3: At any given time t, the utility of an idle
appliance is determined by drawing a random number that is
uniformly distributed in [0, UM ].
IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 3, MAY 2016
Therefore, we simulate a random variable φ from a time
invariant uniform probability distribution over the interval
[0, UM ].
Poisson Connections and Disconnections During t: Given
Assumption 3, the probability that an idle appliance connects
when it observes the price u(t) at time t is (1 − u(t)/UM ).
Moreover, given Assumption 2, the aggregate connection rate
at time t is λ(t) = λM (1 − u(t)/UM ), which limits the
connection rate to the set λ(t) ∈ [0, λM ].
For the dynamic connection rate λ(t) and the constant
disconnection rate per active appliance µ, the controlled
dynamics of n(t) coincide to the dynamics of an M/M/∞
queue. Note that the connection rate is constant between price
updates. Moreover, if a constant price u ∈ [0, UM ] is broadcast over consecutive time intervals, the connection rate is
λ = λM (1 − u/UM ), i.e., time invariant, and the steady state
distribution of n(t), as t → ∞, is Poisson distributed with
rate λ/µ.
C. Objective Function
Recall that the SBO is required to modulate its aggregate
consumption rate as it observes the RS signal y(t) at time t so
as to reach A + Ry(t) kW by the time t + t. In other words,
it promises to achieve connected appliances n(t + t), each
consuming at the same rate of r kW, so that (n(t+t))r is close
to A + Ry(t). To this end, the SBO’s task is to influence the
idle appliance connection rate by broadcasting an appropriate
price u(t) over the period from t to t + t. To the extent that it
fails to track the RS signal, the SBO is assessed the following
tracking cost,
2
(2)
C[t,t+t] = tκ (n(t + t))r − (A + Ry(t))
where κ ≥ 0 is a fixed cost coefficient. We henceforth refer to
(n(t + t)) r − (A + Ry(t)) as the “tracking error”. Note that
the quadratic cost function aims to capture two types of costs
that the SBO incurs in case of imperfect tracking: (i) Loss in
the revenue of RS reserve provision, RR , (ii) the possible cost
of losing RS provision license, which happens in the case of
exceedingly poor tracking ([14]) of the RS signal. Therefore,
severe tracking errors are penalized more due to the increased
possibility of the latter case.
As mentioned above, each appliance realizes a utility when
it connects to the system. We consider this utility as a contribution to smart building’s social welfare. Given Assumption 3,
the expected utility realized by an appliance that connects at
time t is E[φ(t)|φ(t) ≥ u(t)] = (u(t) + UM )/2. Therefore, the
expected utility realized over [t, t + t] is
[t,t+t] = tλM (1 − u(t)/UM )(u(t) + UM )/2.
(3)
An optimal SBO price control policy must trade off optimally tracking cost against appliance user utility gained
every time an idle appliance connects. Therefore, the SBO’s
objective is
min
(K−1)t
C[kt,(k+1)t] − [kt,(k+1)t] ,
(4)
k=0
where t = 4 seconds and K = 900, so that Kt = 1 hour.
BILGIN et al.: PROVISION OF RS BY SMART BUILDINGS
1687
D. Stochastic Dynamic Programming Model
Note that if the SBO were not an RS reserve provider, it
would directly reflect the hour ahead energy clearing price
in its price policy, which would be a constant one since the
SBO would not have to modulate the aggregate consumption;
therefore it would have uc = E . Also recall that A is the
anticipated average energy consumption rate under E . On
the other hand, the SBO is an RS reserve provider and it has
to employ a dynamic price policy so that P(t) tracks A+Ry(t).
By Assumption 1, when the RS signal is tracked closely, the
average energy consumption rate will be equal to A. Now,
Proposition 1 implies that a dynamic price policy that leads
to an average consumption rate A must have a time average
T
equal to E , i.e., ū = T1 0 u(t)dt = E . Defining λ̄ = λM (1−
ū/UM ), we derive the following corollary.
Corollary 1: For sufficiently large λM and κ, the following
equation holds:
As mentioned before, the SBO makes price decision every
t seconds, which is significantly shorter than the one-hour
problem horizon. It is also apparent that SBO makes the
price decisions sequentially, which implies Markov Decision
Process given the system model. Therefore, the problem
is modeled as discrete-time, finite-state, average cost infinite horizon Dynamic Programming (DP) problem, which is
characterized by the following Bellman equation.
h(n, y, d) + J
= min
E′
u∈[0,UM ] n,y,d |u,n,y,d
tg(u, n, y, d)
+ h n + n, y + y, d′
(5)
where
• d ′ is the new direction, whose stochastic dynamics are
described in Equation 1,
• y = y(t + t) − y(t) is an exogenous random variable
independent from u,
• n = n(t + t) − n(t) is a random variable that depends
on u,
• J is the average cost per t,
• h(n, y, d) is the differential cost function ([15]),
• g(u, n, y, d) is the one step cost function, i.e.,
g(u, n, y, d) = κ((n + n)r − (A + Ry))2
− λM (1 − u/UM )(u + UM )/2.
The optimal price policy, u∗ (n, y, d) ∈ [0, UM ], is a function of the state of the system, n, y, d. Once such a policy
is obtained, the SBO could observe n, y and d values every
t seconds and broadcast the price the optimal policy recommends in an automated manner. Given Assumption 2, a single
price update will only be observed by a small fraction of the
appliance population, which will prevent big fluctuations in
the aggregate consumption that might have been caused by
the frequent changes in the price.
IV. I MPACT OF DYNAMIC P RICE P OLICIES ON AVERAGE
C ONSUMPTION AND U TILITY
In this section, we discuss the implications of having a
dynamic price policy and how it affects the aggregate consumption and utility of the appliances. For this reason, we
define a discrete time dynamic policy u(k) ∈ [0, UM ], so that
u(t) = u(k), t ∈ [kt, (k + 1)t], where k = 0, 1, . . . , K − 1,
with Kt = T. In addition, we define uc ∈ [0, UM ] to represent the static price policy of broadcasting a constant price,
i.e., u(t) = uc , t ∈ [0, T]. Note that related proofs for the
propositions and the corollary in this section are included in
the Appendix.
A. Effect of Dynamic Prices on Average Consumption
Proposition 1: The expected electricity consumption of all
N appliances over the horizon [0, T]
is invariant to dynamic
pricing policies u(k) that satisfy T1 K−1
k=0 u(k)t = uc , where
T = Kt.
A = λ̄∗ r/µ
(6)
where
• λ̄∗ = λM (1 − ū∗ /UM ),
• u∗ is the optimal price policy,
1 T
• ū∗ = T 0 u∗ (t)dt,
• T is the problem horizon.
In Section VI-B, Corollary 1 is verified numerically.
B. Effect of Dynamic Price Variation on the
Time-Average of Utility
Proposition 2: The Time-Average of the utility rate realized
over the period [0, T] is smaller under a dynamic pricing policy u(k) than under the static price policy uc =
1 K−1
k=0 u(k)t, for T = Kt. Moreover, the difference in
T
the average utility rate is equal to λM σu2 /2UM .
This is an important result on two counts: First, provision
of Regulation Service reserves results in utility loss on the
part of building appliance users. Hence, even if the SBO
is able to track the RS signal in real-time and avoid tracking error penalties, it must take this prospective utility loss
into consideration while bidding for energy and reserves in
the hour ahead market. Second, responding to RS signals
aggressively will certainly result in a higher utility loss. The
clear trade-off among tracking error cost and utility, justifies
the effort to determine an optimal price policy to which we
turn next.
V. D ETERMINING THE O PTIMAL DYNAMIC P OLICY
In this section, we propose and implement two distinct solution approaches to the DP problem formulated in
Section III-D: (i) A well known Linear Programming (LP)
based method [15], which requires discretization of the allowable control space, and (ii) an innovative computationally
tractable Approximate Policy Iteration (API) algorithm which
bypasses the need to discretize the allowable control space
through reliance on an analytic functional approximation of the
control policy inspired by optimal policy structural properties
revealed in numerical experience with the LP approach.
1688
IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 3, MAY 2016
A. Linear Programming Based DP Solution Approach
Denoting for notational simplicity the discrete state (n, y, d)
by x and the set of all possible states by X, i.e., x ∈ X ⊂ Z, and
discretizing the control set u ∈ U = {0, u, 2u, . . . , mu},
where u = UM /m, allow us to write the optimalityconditions-representing Bellman equation as the following
Linear Program ([15])
observation that changes in y and d shifts the sigmoid along
the axis that measures the tracking error.
Motivated by the discretized optimal policy properties
discussed above, we propose a continuous optimal price
policy approximation by optimizing the parameters θ =
[θ1 , θ2 , θ3 , θ4 ] in the following sigmoid function
U ∗ (x) ≈ uθ (n, y, d) =
J
Maximize
J,h
Jv + h ≤ g(u) + P(u)h,
∀u ∈ U
(7)
where
• h and g(u) are vector notations for h(x) and g(x, u)
respectively for ∀x ∈ X,
• h(x) and J are the differential cost and average cost rate
as defined in Section III-D,
• g(x, u)
=
E [tκ((n + n)r − (A + Ry))2 −
z|x
tλM (1 − u/UM )(u + UM )/2] is the expected period
cost for control u,
• x = (n, y, d) and z = (n + n, y + y, d ′ ) represent the
current and the next system state respectively,
• Pxz (u) is the probability of a transition from state x to z
when the price control is u, and P(u) is the corresponding
transition matrix,
• v is vector of ones.
The optimal control policy u∗ is the mapping of a state x
to a policy u∗ (x) obtained by identifying the specific u of the
m + 1 LP constraints considered for each state i, which is
binding in the optimal solution yielding a nonzero dual variable p(x). The dual variables, p(x), are useful LP solution
output as they represent the steady state probability that the
controlled system visits a particular state x using the associated optimal policy u∗ (x) ([15]). This allows us to estimate
the expected utility that the system will realize under optimal
dynamic prices, the expected variance of the dynamic price,
and the expected utility loss as discussed in Section IV-B.
Numerical solutions to various instances of the discretized
DP problem reveal interesting properties of the optimal policy,
indicating that the tracking error, nr − (A + Ry), is a useful
feature of the state space. More specifically, when nr − (A +
Ry) ≈ 0, the optimal price tends to be close to us that satisfies
A + Ry = λM (1 − us /UM )r/µ
µ
us = UM 1 −
(A + Ry) .
λM r
(8)
In light of previous discussion on the distribution of n under
a constant price control, note that us is the price level that
is asymptotically related to (A + Ry)/r active appliances in
expectation. Indeed, recall that the number of active appliances
behaves as an M/M/∞ queue, which is Poisson distributed
in steady state with mean λ/µ. Moreover, numerical results
indicate that, for fixed y and d, the optimal prices are nondecreasing in the tracking error, nr − (A + Ry), approaching 0
and UM asymptotically. Consequently, for fixed y and d, the
optimal price policy tends to have a sigmoid shape as a function of the tracking error. As discussed further in Section VI-B,
the prevalence of the sigmoid property is strengthened by the
UM
θ
(nr−(A+Ry))+θ
1
2 y+θ3 d+θ4
1+e
(9)
The functional approximation reduces the computational effort
to the optimal selection of the parameters in θ , a task undertaken in the next section where we develop an approximate
policy iteration (API) algorithm to this end.
B. Novel Approximate Policy Iteration DP
Solution Algorithm
Denote the cardinality of the state space by |X|. Note that,
given nm ≤ n(t) ≤ nM , t ≥ 0 by Assumption 2, we have
|X| = 2(nM − nm + 1)(1 + (2/ȳ). Also recall that the size of
the discretized allowable control set is m + 1. A price policy
is represented in general by a lookup table with |X| number of rows mapping each state x to a price u(x). The LP
based DP solution approach requires solving a linear problem
with |X| + 1 decision variables and (m + 1)|X| constraints. In
contrast, a policy iteration algorithm consists of consecutive
estimates of (i) the average and differential cost for a policy
mapping states to prices, u(x), which requires the solution of
a linear system with |X| number of equations, followed by
(ii) a policy improvement which requires (m + 1) evaluations
for each state x to yield a new table u′ (x) [15].
Based on the optimal policy structure discussed above, we
replace the role of table u(x) mapping a state to a price
by the functional approximation of Equation (9), which, in
fact, reduces the table u(x) to the four parameter vector
θ = [θ1 , θ2 , θ3 , θ4 ]. Denoting a policy by the parameter vector θ, our policy iteration algorithm consists of consecutive
estimates of (i) the average and differential costs Jθ and
hθ (x) for all x ∈ X, that satisfy the stationarity condition for
policy θ, followed by (ii) a policy improvement step that
revises the old policy θ to a new policy θ ′ .
Step (i) is accomplished by solving a linear system or equivalently the following LP with |X| + 1 decision variables and
|X| constraints, where the subscript θ denotes the policy as
specified by Equation (9)
Maximize
Jθ
Jθ ,hθ
Jθ v + hθ ≤ g(uθ ) + P(uθ )hθ .
The LP solution for a given θ will have all of the constraints binding, i.e., satisfying a singular linear system with
|X| equations but |X| + 1 which takes care of the singularity
and provides Jθ and hθ (x), ∀x ∈ X as the primal solution. In
addition, the dual variables, pθ (x) for each state x, provide the
vector pθ which is the unit eigenvalue related eigenvector of
the transition probability matrix [Pxz (uθ )], namely the vector
of steady state probabilities pθ (x) for each state x when the
system is controlled by policy θ ([15]).
BILGIN et al.: PROVISION OF RS BY SMART BUILDINGS
1689
Step (ii), however, is not as straight forward: The convergence of the policy iteration algorithm requires the average
cost to decrease in each iteration, i.e., Jθ(k+1) ≤ Jθ(k) ; and the
change in the cost is expressed as follows ([15]):
pθ (k+1) (x)δθ(k),θ (k+1) (x),
(10)
Jθ (k) − Jθ (k+1) =
TABLE I
C OMPARISON OF THE ACTUAL PJM DATA AND THE TDMC
M ODEL ON T IME F REQUENCIES OF RS S IGNALS
∀x∈X
where we define
δθ(k),θ(k+1) (x)
=
g x, uθ (k) +
Pxz uθ (k) hθ (k) (z)
TABLE II
C OMPARISON OF THE ACTUAL DATA AND THE
TDMC M ODEL ON RS S IGNAL S TATISTICS
∀z∈X
− g x, uθ (k+1) +
Pxz uθ (k+1) hθ(k) (z)
∀z∈Z
.
(11)
Therefore,
in
vector
notation,
we
need
p′θ (k+1) δ(θ(k),θ (k+1)) ≥ 0. This non-negativity condition is
trivially achievable in standard policy iteration, where the
ability to select an improved policy uk+1 (x) for each state i
guarantees the non-negativity of each term in the weighted
average above, namely, δuk ,uk+1 (x) ≥ 0 ∀x ∈ X. In our θ
parameter based functional approximation of the policy,
however, there is no obvious way to guarantee non-negativity.
Were we able to implement it, an intuitive policy parameter
vector improvement would be to solve:
θ(k + 1) = argmax
p′θ (k+1) δ(θ (k),θ(k+1)) .
(12)
The problem with the optimization problem in (12) is that
pθ (k+1) is not known until θ (k + 1) is obtained and step (i) of
the policy iteration is repeated. For this reason, we propose
to approximate pθ (k+1) by pθ (k) while imposing a constraint
in the optimization problem that forces θ (k + 1) to be close
to θ(k). Our implementable policy improvement step (ii) is
thus:
max
θ (k+1)∈[θ (k)−θ,θ (k)+θ]
p′θ (k) δ(θ(k),θ (k+1))
(13)
with θ having all non-negative entries. When the elements
of θ are small enough, optimization problem (13) satisfies
the non-negativity convergence requirement since,
pθ(k) ≈ pθ(k+1) ,
follows from θ (k) ≈ θ (k + 1). (14)
In the iterative implementing of optimization problem (13),
the four dimensional vector θ is updated adaptively as follows: When the condition Jθ(k+1) ≤ Jθ (k) is violated, θ is
multiplied by ρ, 0 < ρ < 1 and the policy improvement step
repeated, otherwise θ is multiplied by 1/ρ. Numerical experience shows that, most of the time, entries of θ do not have
to be very small for the non-negativity condition to be satisfied
and the policy iteration to converge.
VI. N UMERICAL R ESULTS
In this section, we (i) discuss the adequacy of the TwoDimensional Markov Chain (TDMC) representation of the RS
signal, (ii) use an 11 point discretization of the control set to
obtain the LP-based optimal solution and use it to observe the
structure of the optimal policy and verify the price varianceutility loss properties derived in Section IV, and finally (iii) we
simulate the performance of the discretized control set optimal
policy, solve the sigmoid function based API policy, and show
that it compares favorably to the LP-based optimal policy from
both the accuracy and computational efficiency points of view.
The following input is used in the base case problem definition of this section: A = 50 kW, R = 30 kW, r = 1 kW,
ȳ = 1/30, λM = 150 /min, µ = 1 /min, κ = 100 ¢/ kW2 ,
UM = 50¢, m = 10, t = 1/15 min. We note that the objective function provided in (5) is scaled by 60/t in order to
obtain the optimal cost for an hour rather than t minutes.
Therefore, κ and UM values given here correspond to the cost
coefficient and the utility bound for an hour.
A. Model for RS Signal Dynamics
Since the TDMC model of RS signal dynamics is a key
driver of the optimal solution, we placed particular attention
to its calibration and tested its fidelity against historical data.
The range of y was discretized by ȳ = 1/30, requiring
60 possible RS signal values for each direction. The transition probabilities were estimated from actual historical data
of PJM’s “Fast Response Regulation Signal”, available in [14]
and shown in Figure 1. The historical data was first mapped to
the 120 discretized (y, d) states and a frequency distribution
was estimated. The transition frequencies were then used to
estimate the TDMC transition probabilities.
Table I compares historical data to simulated frequencies
of y(t) in four broad ranges. Finer range comparisons, not
shown due to space limitations, proved quite accurate as well.
Table II compares historical data and model simulated E[y],
σy2 estimates as well as the proportions of time y(t) is in an up
or down sub-trajectory. As such, our observations convinced
us of the model’s fidelity.
Here, it should be also mentioned that although ISOs aim to
achieve energy neutrality by deploying tertiary reserves on an
hourly basis, and the long run average of y is approximately
zero, the past PJM data shows that energy neutrality does not
always hold in a given hour. In fact, a two-sided t-test for
hourly energy neutrality on the PJM RS signal that belongs
to 10/31/11 reveals a p-value of 0.0107, which would imply
higher tracking costs in practice than expected.
1690
IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 3, MAY 2016
TABLE IV
C OMPARISON OF T HEORETICAL AND E XPERIMENTAL
U TILITY L OSS (T.U.L. AND E.U.L., R ESPECTIVELY )
P ER H OUR FOR D IFFERENT VALUES OF
RS P ROVISION
Fig. 3.
Optimal Price Policy for d = +1.
TABLE III
L OW S ENSITIVITY OF AVERAGE C ONSUMPTION
TO DYNAMIC O PTIMAL P RICES
B. Structure of the Optimal Price Policy
Figure 3 shows the optimal prices (u ∈ {0, 0.1UM , . . . , UM },
vertical axis) for different tracking error values (nr − (A + Ry),
horizontal axis), for y = {−1, 0, 1} and d = +1. The optimal
price policy is non-decreasing in the tracking error, and u∗ ≈
us when the tracking error close to zero. Recall that us is the
price that guides the system to n = (A + Ry)/r in expectation
in steady state when it is broadcast over a long enough period
(see Section V-A).
In Figure 3, the shape of the curve that represents the optimal price policies for fixed y and d is similar to that of a
sigmoid function. Moreover, increasing values of y shift the
sigmoid-like optimal policy to the right, which results in a
lower optimal price for a given tracking error. This observation motivated the approximation of optimal prices by an
analytic sigmoid function whose parameters are selected by
our API algorithm in Section VI-F.
Table III presents the observed time averages of optimal
prices, ū∗ , and connection rates (λ̄∗ ) for different problem
instances. As suggested in Corollary 1, the average prices
satisfy A = λM (1 − ū∗ /UM )r/µ = λ̄∗ r/µ. Note that the optimal policy elicits average consumption that is slightly higher
than A, a result indicating that the tracking error versus utility
loss tradeoff weighs towards a slightly higher tracking error
cost for the benefit of lower utility loss.
C. Price Variance-Utility Loss Relationship
Proposition 2 claims that the Utility Loss due to varying
dynamic optimal prices equals λM σu2 /(2UM ). Table IV results
validate this relationship for different problem instances with
different R values. The theoretical values of utility loss match
with the experimental results obtained using optimal LP dual
Fig. 4.
Tracking Error under the Optimal Price Policy.
variables as explained in Section V-A. It is noteworthy that
the price variance and associated utility loss increase with the
RS reserve level. The SBO should consider this relationship
in deciding its preferred RS reserve offer in the Hour Ahead
market.
D. Simulated Consumption Trajectory Under the
Optimal Price Policy
The objective of the optimal policy is to assist the SBO to
track the RS signal the ISO broadcasts. Figure 4 displays a
two hour building consumption trajectory simulated under the
optimal policy. For given appliance consumption characteristics and the RS signal dynamics, Figure 4 shows that the SBO
is able to successfully track the RS signal implied obligation
plotted in addition to the actual consumption (the dotted green
line shows the consumption rate n(t)r of the building at time t,
while the continuous red line shows the obligation A + Ry(t)).
Since the building modeled by the input data is relatively small
(i.e., small λ̄), there is a large coefficient of variation in the
associated M/M/∞ queue, and the fluctuations of the consumption rate about the obligation
trajectory are
noticeable.
The average tracking error, (E |nr − (A + Ry)| ), is equal to
2.1 kW, which is 7 % of the RS provision. As Section VI-A
proves the success of the TDMC in modeling the RS signal
behavior, in this simulation, a Markov chain model that is calibrated with 21,600 data points (24-hour) is used instead of
the historical data.
BILGIN et al.: PROVISION OF RS BY SMART BUILDINGS
TABLE V
G ENERALIZATION OF THE P RICE P OLICY TO
M ULTIPLE A PPLIANCE C LASSES
TABLE VI
I NVARIANCE OF THE R ELATIVE A BSOLUTE
T RACKING E RROR TO THE A PPLIANCE
U TILITY D ISTRIBUTION
E. Experimenting Generalized Problem Settings
In the analytical derivations and experiments above, we considered a single appliance class with a utility distribution that is
uniform over [0, U M ]. In reality, however, appliances are clustered in more than one classes. In this section, we numerically
explore how the policies obtained under those assumptions can
be extended to more general cases. First, we start with experimenting the cases where there are multiple appliance classes.
To this end, we use the index j for the class specific values
of the problem parameters, such as λM
j , rj , µj , uj . The optimal policies used in VI-D are obtained for an appliance class
with r = 1 kW and µ = 1/min. As the quantity r/µ specifies the expected energy consumption for an appliance per
connection, we consider the optimal prices obtained for this
base case as “unit” prices, ub , and obtain class specific prices
as uj = ub rj /µj . To measure the impact of this generalization, we compare the average values of ǫT /R over the course
of a 24-hour simulation for
various cases in Table V as the
total consumption capacity, i λM
i ri /µi , is the same for all
cases. Note that ǫT is the absolute tracking error as defined
above. The results show that, although less successfully, the
SBO is able to track the RS signal in the presence of multiple appliance classes and the ǫT /R ratio is below 13% in
all cases.
Secondly, in Table VI, we show how the relative absolute
tracking error, ǫT /R, differs when Assumptions 2 and 3 are
relaxed, namely when non-uniform distributions are used to
model the appliance utility. Numerical results show that the
result is almost invariant to the distribution of the utility and
the SBO tracks the RS signal as successfully as when the
uniform distribution is used.
F. Sigmoid Function Approximation to Optimal Price
Policy and Implementation of API Algorithm
Figure 3 motivates a continuous functional approximation
of the optimal price policy by a sigmoid function whose
1691
TABLE VII
C OMPARISON OF LP AND API S OLUTIONS
parameters are optimized by the API algorithm of Section V-B.
The value of this approximation relies on: (i) The effectiveness of the API algorithm in converging to the optimal
parameter values, and (ii) the validity of the sigmoid function in representing the optimal price policy. We were able
to verify the former by least squares fitting of a sigmoid
function to the optimal discrete prices obtained from the LP
solution approach. To investigate the latter, optimal objective function values of the two methods are compared in
Table VII. As this comparison shows, not only does the
API algorithm give close results to the LP solution, but the
objective values obtained with the API algorithm generally
dominate the LP solution for small discretization accuracy,
i.e., m = 5.
In API algorithm, the initial parameters are simply chosen as θ (0) = [−1, 1, 0, 0] based on the observation that the
optimal price is non-decreasing in the tracking error and nonincreasing in the RS signal value, as per Figure 3. We also
observe that the LP solution computational effort is particularly long because the transition probabilities over t = 4 sec
result in dense constraints. Moreover, as we expect, increasing
the discretization accuracy by using higher m values significantly increase the solution time. On the other hand, in the
API algorithm, less than half of the total computation time
is used in fine tuning around a small convergence tolerance,
and the solution time can be reduced significantly by selecting large ǫ values without deviating from the optimal objective
value significantly. The results reported in Table VII have been
obtained with the following input: κ values are equal to 1,
5, 20 and 1 in cases B1,B2,B3 and B4, respectively, where
A = 50 and λM = 100 for the cases B1,B2,B3. Case B4 uses
A = 100 and λM = 150, and the contribution of the utility in
the objective function is neglected in this case. In all cases, the
API algorithm employed ρ = 0.5 and convergence tolerance
ǫ = 0.1.
VII. C ONCLUSION
We have shown that optimal SBO managed policies can be
obtained efficiently and implemented for effective provision of
RS reserves by flexible loads. To this end, we investigated the
structure and properties of the underlying Stochastic DP problem and proposed an efficient Approximate Policy Iteration
algorithm to solve it, which we also extended to multiple appliance classes and non-uniform utility distributions. Exploring
the impact of removing the assumptions on the optimal price
policy analytically is a vital component to explore in future
studies.
1692
IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 3, MAY 2016
A PPENDIX
A. Proof of Proposition 1
Proof: The expected number of connections during
[kt, (k + 1)t] is λM (1 − u(k)/UM )t, and the expected
energy consumption associated with these connections is
tλM (1 − u(k)/UM )r/µ, since r is the consumption rate of an
active appliance and an appliance is expected to stay active for
1/µ amount of time. Then, the expected average consumption
committed over [0, T], denoted by E[C[0,T] ], is
K−1
1 λM r
u(k)
t
1−
T
µ
UM
k=0
K−1
1
λM r
k=0 u(k)t
T
=
1−
µ
UM
E[C[0,T] ] =
(15)
B. Proof of Corollary 1
Proof: For large κ, the tracking error penalty dominates the
utility loss, and, for large enough λM and Assumption 1, the
optimal price policy u∗ results in average consumption rate of
A kW. By Proposition 1 the dynamic pricing policy u∗ and the
static price policy ū∗ result in the same average consumption.
However, under static price policy ū∗ , n(t) is equivalent to an
M/M/∞ queue which is Poisson distributed with mean λ̄∗ /µ
implying average consumption λ̄∗ r/µ = A.
C. Proof of Proposition 2
Proof: We express the total utility realized over [0, T] under
a dynamic price policy as the summation of the utilities
realized over the disjoint sub-intervals, i.e.,
K−1
[0,T]
[kt,(k+1)t]
=E
u
(16)
E u
=
E [kt,(k+1)t]
u
k=0
(17)
where the second equality follows from the fact that the
Poisson connections in each sub-interval are independent
from each other. The expected number of connections during [kt, (k + 1)t] is λM (1 − u(k)/UM )t. An appliance
that connects at time t ∈ [kt, (k + 1)t] realizes a utility φ,
where E[φ|φ ≥ u(k)] = (u(k) + UM )/2. Then the expected
utility realized over [kt, (k + 1)t] is given by
E u[kt,(k+1)t] = tλM (1 − u(k)/UM )(u(k) + UM )/2.
(18)
Therefore,
K−1
u(k)
E u[0,T] =
tλM 1 −
UM
k=0
u(k) + UM
2
=
λ T K−1
(ǫ(k))2 t
λM T
M
(UM )2 − (uc )2 −
2UM
2UM
T
k=0
λ T K−1
(ǫ(k))2 t
M
−
= E u[0,T]
.
c
2UM
T
(20)
k=0
The result follows when T1 K−1
k=0 u(k)t = uc . Note that this
proposition also holds when the price u updates are made in
continuous time. This is evident when we consider t → 0.
k=0
K−1
[kt, (k +
where T = Kt. Letting ǫ(k) = u(k) − uc during
1)t], i.e., u(k) = uc + ǫ(k) and therefore K−1
k=0 ǫ(k) = 0,
Equation 19 can be written as
K−1
2
(UM )2 − (uc )2
k=0 (ǫ(k)) t
[0,T]
= λM T
−
E u
2UM
2UM T
(19)
The first component of the expression above is the expected
utility that would be realized under a static price policy uc over
[0, T], while the second term, which we call Utility Loss represents a decrease in the static price utility whose magnitude
increases in proportion to the time average of (ǫ(k))2 . Since
ǫ(k) = u(k) − uc , the term with the summation implies the
variance of u(k) over the trajectory k = 0, 1, . . . , K − 1, σu2 .
Therefore, we can express the Utility Loss per unit time as
λ σ 2
1 [0,T]
M u
E u c
− E u[0,T] =
.
(21)
T
2UM
R EFERENCES
[1] S. G. Whitley, “New challenges facing system operators,” in Proc. Cornell Univ. Workshop Past Present Future
Power Grid, New York, NY, USA, Aug. 2012. [Online].
Available:
http://www.nyiso.com/public/webdocs/media_room/
publications_presentations/NYISO_Presentations/NYISO_Presentations/
New_Challenges_Facing_System_Operators-_S_Whitley_08_09_12.pdf
[2] Y. V. Makarov, C. Loutan, J. Ma, and P. de Mello, “Operational impacts
of wind generation on California power systems,” IEEE Trans. Power
Syst., vol. 24, no. 2, pp. 1039–1050, May 2009.
[3] G. Gownisankaran, S. S. Reynolds, and M. Samano, “Intermittency and
the value of renewable energy,” Nat. Bur. Econ. Res., Cambridge, MA,
USA, Tech. Rep. 17086, May 2001.
[4] K. Porter et al., “Review of industry practice and experience in the integration of wind and solar generation,” GE Energy, Atlanta, GA, USA,
Nov. 2012. [Online]. Available: http://bit.ly/21b6oE5
[5] ISONE. (2014). Website. [Online]. Available: http://www.iso-ne.com
[6] E. Bilgin and M. C. Caramanis, “Decision support for offering loadside regulation service reserves in competitive power markets,” in
Proc. 52nd IEEE Conf. Decis. Control, Florence, Italy, Dec. 2013,
pp. 5628–5635.
[7] F. C. Schweppe, “Power: Power systems ‘2000’: Hierarchical control
strategies: Multilevel controls and home minis will enable utilities to buy
and sell power at ‘real time’ rates determined by supply and demand,”
IEEE Spectr., vol. 15, no. 7, pp. 42–47, Jul. 1978.
[8] A. W. Berger and F. C. Schweppe, “Real time pricing to assist in load
frequency control,” IEEE Trans. Power Syst., vol. 4, no. 3, pp. 920–926,
Aug. 1989.
[9] N. Lu and D. P. Chassin, “A state-queueing model of thermostatically controlled appliances,” IEEE Trans. Power Syst., vol. 19, no. 3,
pp. 1666–1673, Aug. 2004.
[10] M. D. Galus, S. Koch, and G. Andersson, “Provision of load frequency control by PHEVs, controllable loads, and a cogeneration
unit,” IEEE Trans. Ind. Electron., vol. 58, no. 10, pp. 4568–4582,
Oct. 2011.
[11] I. C. Paschalidis, B. Li, and M. C. Caramanis, “Demand-side management for regulation service provisioning through internal pricing,” IEEE
Trans. Power Syst., vol. 27, no. 3, pp. 1531–1539, Aug. 2012.
[12] M. Caramanis, I. C. Paschalidis, C. Cassandras, E. Bilgin, and E. Ntakou,
“Provision of regulation service reserves by flexible distributed loads,”
in Proc. 51st IEEE Conf. Decis. Control, Maui, HI, USA, Dec. 2012,
pp. 3694–3700.
BILGIN et al.: PROVISION OF RS BY SMART BUILDINGS
[13] I. C. Paschalidis and J. N. Tsitsiklis, “Congestion dependent pricing of
network services,” IEEE ACM Trans. Netw., vol. 8, no. 2, pp. 171–184,
Apr. 2000.
[14] PJM. (2013). Market-Based Regulation. [Online]. Available:
http://pjm.com/markets-and-operations/ancillary-services/mkt-basedregulation.aspx
[15] D. Bertsekas, Dynamic Programming and Optimal Control, vol. 2,
3rd ed. Belmont, MA, USA: Athena Scientific, 1995.
Enes Bilgin received the B.S. degree from Bilkent
University in 2010, and the M.S. and Ph.D. degrees
from Boston University in 2012 and 2014, respectively. He is currently an SMTS Operations Research
Scientist with Advanced Micro Devices, Inc., and
an Adjunct Faculty with the Ingram School of
Engineering, Texas State University. His research
interests include demand response in smart grids,
Markov decision processes, and supply chain optimization in automated humanitarian missions. He
was a recipient of numerous awards, including an
Honorable Mention in the IBM/IEEE Smarter Planet Challenge.
Michael C. Caramanis received the B.S. degree
from Stanford University in 1971, and the M.S.
and Ph.D. degrees from Harvard University in 1972
and 1976, respectively. Since 1982, he has been a
Mechanical Engineer with Boston University, where
he is a Professor of Systems. He was the Chair
of the Greek Regulatory Authority for Energy and
the International Energy Charter’s Investment Group
from 2004 to 2008. He was active in power market implementations in England and Italy. He has
co-authored the book Spot Pricing of Electricity
(Kluwer, 1987) and 100+ refereed publications. His disciplinary background
is in mathematical economics, optimization, and stochastic dynamic decision
making. His recent application domain focus is marginal costing and dynamic
pricing on smart power grids, grid topology control for congestion mitigation,
and the extension of power markets to include distribution connected loads,
generation, and resources.
1693
Ioannis Ch. Paschalidis (M’96–SM’06–F’14)
received the M.S. and Ph.D. degrees in electrical engineering and computer science from
the Massachusetts Institute of Technology (MIT),
Cambridge, MA, USA, in 1993 and 1996, respectively. In 1996, he joined Boston University,
where he has been ever since. He is a Professor
and the Distinguished Faculty Fellow with the
Department of Electrical and Computer Engineering,
the Division of Systems Engineering, and the
Department of Biomedical Engineering, Boston
University. He is the Director of the Center for Information and Systems
Engineering. He has held visiting appointments with MIT and Columbia
University, New York, NY, USA. His current research interests lie in the
fields of systems and control, networking, applied probability, optimization,
operations research, computational biology, and medical informatics.
Christos G. Cassandras received the B.S. degree
from Yale University in 1977, the M.S.E.E. degree
from Stanford University in 1978, and the S.M.
and Ph.D. degrees from Harvard University in 1979
and 1982, respectively. From 1982 to 1984, he was
with ITP Boston, Inc., where he worked on the
design of automated manufacturing systems. From
1984 to 1996, he was a Faculty Member with the
Department of Electrical and Computer Engineering,
University of Massachusetts, Amherst. He is currently a Distinguished Professor of Engineering, the
Head of the Division of Systems Engineering, and a Professor of Electrical
and Computer Engineering with Boston University. He specializes in the areas
of discrete event and hybrid systems, cooperative control, stochastic optimization, and computer simulation with applications to computer and sensor
networks, manufacturing systems, and transportation systems. He has published over 350 papers in the above areas and five books. He was a recipient of
several awards, including the 2011 IEEE Control Systems Technology Award,
the Distinguished Member Award of the IEEE Control Systems Society
(2006), the 1999 Harold Chestnut Prize (IFAC Best Control Engineering
Textbook), the IBM/IEEE Smarter Planet Challenge Prize in 2011 and 2014,
the 2012 Kern Fellowship, and the 1991 Lilly Fellowship. He was the Editorin-Chief of the IEEE T RANSACTIONS ON AUTOMATIC C ONTROL from 1998
to 2009, and has served on several editorial boards and as a Guest Editor for
various journals. He was the 2012 President of the IEEE Control Systems
Society. He is a Member of Phi Beta Kappa and Tau Beta Pi, and a Fellow
of the IFAC.