Plug and Play! A Simple, Universal Model For Energy Disaggregation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Plug and Play!

A Simple, Universal Model for Energy


Disaggregation

Guoming Tang, Kui Wu Jingsheng Lei Jiuyang Tang


Department of Computer School of Computer and Information Systems
Science Information Engineering Engineering Lab
University of Victoria, Victoria, Shanghai University of Electric National University of Defense
BC, Canada Power, China Technology, China
arXiv:1404.1884v1 [cs.AI] 7 Apr 2014

ABSTRACT much energy each appliance consumes, 2) take neces-


Energy disaggregation is to discover the energy consump- sary actions to save energy, and 3) participate in utility
tion of individual appliances from their aggregated energy demand response programs. Furthermore, with smart
values. To solve the problem, most existing approaches rely meters broadly deployed in many countries, sufficiently
on either appliances’ signatures or their state transition pat- high resolution of energy data can be collected, mak-
terns, both hard to obtain in practice. Aiming at developing a ing it feasible to develop efficient energy disaggregation
simple, universal model that works without depending on so- solutions.
phisticated machine learning techniques or auxiliary equip- Due to its critical meaning, the energy disaggregation
ments, we make use of easily accessible knowledge of ap- problem has attracted more and more attention since
pliances and the sparsity of the switching events to design 1980s. Recently, it has also drawn attention from both
a Sparse Switching Event Recovering (SSER) method. By large electronics companies and small start-ups, such
minimizing the total variation (TV) of the (sparse) event ma- as Intel, Belkin, GetEmme, and Navetas. While many
trix, SSER can effectively recover the individual energy con- methods have been developed for energy disaggregation,
sumption values from the aggregated ones. To speed up the according to [26], no solutions work well for all types of
process, a Parallel Local Optimization Algorithm (PLOA) household appliances. They either work poorly for new
is proposed to solve the problem in active epochs of ap- types of appliances or require complex machine learning
pliance activities in parallel. Using real-world trace data, methods to learn appliances’ (latent) features.
we compare the performance of our method with that of the 1.1 Motivations
state-of-the-art solutions, including Least Square Estimation
(LSE) and iterative Hidden Markov Model (HMM). The re- We are motivated to develop a simple and broadly
sults show that our approach has an overall higher detection applicable solution for energy disaggregation, based on
accuracy and a smaller overhead. the following observations:

Categories and Subject Descriptors • Most existing methods are based on appliances’
energy usage patterns, also called the signatures
H.5.m [Information Systems]: Information interfaces of appliances, which are hard to obtain without
and presentation—Miscellaneous particular machine learning techniques or auxiliary
measurements. For example, in [11], extra equip-
General Terms ments are needed to detect the activities of ap-
Algorithm, Measurement, Experimentation pliances based on high frequency electromagnetic
interference (EMI).
Keywords
• The rated power of an appliance is normally avail-
Energy Disaggregation, Data analysis, Optimization
able in practice, from users’ manual, technical spec-
1. INTRODUCTION ification or public web sites such as [8].
Energy disaggregation, also known as non-intrusive • The temporal sparsity of the on/off switching events
appliance load monitoring (NIALM), aims to learn the has been recognized as a general feature suitable
energy consumption of individual appliances from their for most appliances, i.e., an appliance cannot switch
aggregated energy consumption values, e.g., the total on/off many times in a very short time period.
energy consumption of a house. With accurate en- Nevertheless, this property has not been fully en-
ergy disaggregation, the household can 1) learn how trusted with an important post in energy disaggre-

1
gation. To develop a universal solution for energy build either steady or transient signal features of ap-
disaggregation, the sparsity feature could play an pliances with labeled training datasets. The signal fea-
important role and should be regarded as a signif- tures are treated as the appliances’ signatures [20, 18],
icant knowledge. based on which event detection schemes are developed
to detect appliances’ on/off as well as different running
1.2 Our Contributions states. The detected events are ascribed to certain ap-
Aiming at establishing an easy-to-use, universal model pliances’ activities via classification [6, 24, 17]. In ad-
for energy disaggregation, we make the following con- dition to time-domain signal features, spectral analysis
tributions in this paper: has also been adopted to search for appliances’ signa-
tures in the frequency domain [19, 23, 11].
• We do not rely on appliances’ signature. Instead, The performance of signature based approaches de-
we use the appliances’ rated power and power de- pends greatly on the uniqueness of an appliance’s sig-
viation, which are easy to obtain, e.g., from the nature. In practice, however, the signatures of differ-
user’s guide of appliances. With experimental eval- ent appliances may overlap with each other, causing
uation, we show that the method is robust even if inaccurate event detection. Even for the same type of
this information is not very accurate. appliances, it may be hard to obtain the widely accept-
• Based on the simple power model and the sparsity able signature [26]. In other words, it is hard to gen-
property of appliance activities, we establish a uni- eralize the signature learned from a particular device’s
versal Sparse Switching Event Recovering (SSER) operating data. Consequently, even though a method
optimization model. Unlike existing methods that may have a good performance over a specific group of
minimize the aggregated residual value, our method appliances, it may suffer in other datasets, caused by
tries to minimize the total variation of on/off switch- the over-fitting problem due to over-specific signatures.
ing events. The new objective function, while very Due to these difficulties, it is not easy to use signa-
effective as we will show later, has never been ex- ture based methods for unambiguous appliance detec-
plored before to solve the energy disaggregation tion and classification.
problem.
• We develop a Parallel Local Optimization Algo-
2.2 State Transition Based Methods
rithm (PLOA) to solve SSER, which can signifi- A number of methods made use of state transition
cantly reduce the computational complexity of the in appliances’ activities for energy disaggregation. Re-
original problem and is guaranteed to obtain the cently, the Hidden Markov Model (HMM) was adopted
optimal solution if some weak hypotheses hold. to model the state transition patterns of appliances.
The hidden states of each appliance at each time in-
• We build a small-scale energy monitoring platform stant are predicted by inference algorithms, such as the
for a group of household appliances, and evaluate Viterbi algorithm, with the observed emission probabil-
our method using the real-world trace data col- ities [13, 22]. Non-negative sparse coding was proposed
lected over the platform. The experimental re- to solve the energy disaggregation problem in [14]. It
sults indicate that our approach has an overall was further discussed in [9], in which a training pro-
better performance than state-of-the-art solutions, cess was needed to obtain the basis vector related to
including the well-known Least Square Estimation the state transition patterns of different appliances. Al-
(LSE) method and a recently-developed machine though some other works, such as [25], were not to solve
learning method using iterative Hidden Markov the energy disaggregation problem, they also utilized
Model (HMM). the appliance state transition information and their re-
sults may be helpful for energy disaggregation.
2. RELATED WORK The methods in this category usually need a large
Tremendous research efforts have been devoted to number of trainings, and thus are time consuming. In
solving the energy disaggregation problem. The exist- addition, the performance highly relies on the pattern
ing approaches can be roughly divided into two cat- of appliances’ activities in the training datasets, and as
egories: signature based methods and state transition such the performance may vary significantly from test
based methods. to test.
2.1 Signature Based Methods Finally, in the above two types of approaches, op-
Most approaches are based on appliances’ signatures, timization algorithms were used to search for optimal
i.e., specific features such as the real/reactive power, solutions. Generally, the objective was to minimize the
current, and voltage of running appliances [12]. These difference between the predicted value and real aggre-
methods need the support of high sampling rate and gation value. For example, in [24], Least Square Esti-

2
mation (LSE) was used to find the tightest fit for the
aggregated waveform. Nevertheless, as we will disclose
in this paper, such solutions usually do not match well
the true switching events of appliances, leading to inac-
curate energy disaggregation results.

3. SPARSITY OF SWITCHING EVENTS


For ease of reference, we list the notations in Table 1.

Table 1: Table of Notations


Symbol Explanation
S state matrix
state vector of the n-th appliance along
S (n)
the timeline
St state vector of all appliances at time t
(n)
St state of the n-th appliance at time t
St:t+` states of all appliances from time t to t + `
∆S event matrix Figure 1: Energy consumption and appliances’
(n) switching event of the n-th appliance at on/off switching events in a house over the
∆St time t course of a day [15]
D differential matrix
X aggregated power vector
power readings of the n-th appliance along We denote the on/off states of N appliances from
X (n) time t = 1 to T with a state matrix, S, defined as
the timeline
aggregated power reading of all appliances
 (1) (1) (1)

Xt S1 S2 · · · ST
at time t  (2) (2) (2) 
 S1 S2 · · · ST 
power reading of the n-th appliance at S := 
Xt
(n)
 .. .. .. .. , (1)
time t  . . . . 
aggregated power readings of all appli- (N ) (N ) (N )
Xt:t+` S1 S2 · · · ST
ances from time t to t + `
(n)
I stand-by power vector in which St represents the on/off state of the n-th
In stand-by power of the n-th appliance (n) (n)
appliance at time t, and St ∈ {0, 1} with St = 1
P rated power vector indicates the n-th appliance is on and 0 otherwise.
Pn rated power of the n-th appliance Then, the on/off switching events of the N appliances
P0 baseline power of a house from t = 2 to T can be indicated by an event matrix,
Θ power deviation vector ∆S, calculated as
Θn power deviation of the n-th appliance
W set of active epochs ∆S = SD, (2)
Wk the k-th active epoch where D is a differential matrix with size of N -by-(N −
Γ set of power modes 1):
Γnk the k-th power mode of the n-th appliance  
−1
 1 −1
Fig. 11 shows an example of energy consumption and

 
appliances on/off switching events in a typical house
 .. 
 1 . 
during one day. From the figure, we can see that: D :=    (3)
 . . . −1


• As shown in Fig. 1-a, the appliances do not switch 
 1 −1

on/off frequently in the whole time period. 1
• Most switching events happen in a small number (n)
The element of event matrix ∆St ∈ {−1, 0, 1}, with
of time intervals, which we call active epochs (refer (n)
to Section 5.2 for formal definition) and are illus- ∆St = 1 or −1 indicating a switching on or off event
trated with shaded windows in Fig. 1-b. of the n-th appliance at time t, respectively, and 0 no
switching event. Since the sampling rate of current
1
The figure is borrowed from [15] with slight modification smart meters can reach 1 to 10 samples per second [4],
for better illustration. we neglect the situation where an appliance has a se-

3
ries of switching events within a sampling interval, i.e., is bounded by:
(n)
|∆St | < 2. (1 − St )T I + StT (P − Θ) ≤ Xt ,
Assertion 1. According to our real-world observa- (8)
tions, ∆S is a sparse matrix. (1 − St )T I + StT (P + Θ) ≥ Xt ,
where 1 is the all-one vector. In other words, the fol-
4. SYSTEM MODEL lowing constraints hold:

4.1 Power Pattern X − S T (P + Θ) − (I − S)T I ≤ 0,


We focus on the aggregated power readings of a num- (9)
S T (P − Θ) + (I − S)T I − X ≤ 0,
ber of appliances in a house, and arrange them from
time t = 1 to T as an aggregated power vector, where I is the N -by-T all-one matrix.
T
X := [X1 , X2 , · · · , XT ] . (4) 4.2 Sparse Switching Events Recovering
Without loss of generality, all vectors in the paper In order to solve the energy disaggregation problem,
are column vectors. Note that we slightly abuse the we transform the original problem, which aims at break-
notation of T to denote both time and the transpose ing up the aggregated power readings to individual ap-
of a vector/matrix. From the context, however, it is pliance at each time instant, to an alternative one, which
easy to figure out the difference, since when T is used aims at recovering the on/off states of individual appli-
as the superscript of a vector/matrix, it always means ance at each time instant. Thus, the problem is formally
the transpose of the vector/matrix in this paper. defined as:
The power pattern of an appliance indicates the en-
• Input: Aggregated power vector X, power pat-
ergy consumption value when it is turned on or in stand-
tern (I, P, Θ).
by state. In this paper, we use a simple power model
which can be easily obtained from the user’s guide or the • Output: State matrix S, i.e., the on/off states of
specification of an appliance. We represent the power all appliances along the timeline.
pattern of an appliance n by a tuple (In , Pn , Θn ), where
In is its stand-by power, Pn is its rated power, and Θn A Sparse Switching Event Recovering (SSER)
is its power deviation. model is established to recover the states of N appli-
Assume that a house is equipped with N appliances. ances from time t = 1 to T .
We define a stand-by power vector to represent their min TV(∆S)
S
stand-by powers as
T
s.t. X − S T (P + Θ) − (I − S)T I ≤ 0, (10)
I := [I1 , I2 , · · · , IN ] , (5) T T
S (P − Θ) + (I − S) I − X ≤ 0,
a rated power vector to represent their rated powers as
where ∆S is defined by (2) and TV(·) denotes the total
T
P := [P1 , P2 , · · · , PN ] , (6) variation of the event matrix calculated by
and a power deviation vector to represent their power T
N X
(n)
X
TV(∆S) := ∆St . (11)

deviations as
T n=1 t=1
Θ := [Θ1 , Θ2 , · · · , ΘN ] . (7)
After obtaining the on/off states of each appliance
Definition 1. Given a house with a certain number along the timeline, we can estimate its power readings
of appliances, we call the sum of the appliances’ stand- with its rated power at each time instant. Therefore,
by power, denoted by P0 , as the baseline power of the we can get an approximate estimation of the power
house, i.e., P0 = kIk1 . consumption of each appliance. This is equivalent to
solving the original energy disaggregation problem.
Note that virtually all appliances’ stand-by power
could be found from users’ manual, technical specifica- 4.3 On TV Minimization
tion or public websites such as [8, 16]. Theoretically, P0 The total variation (TV) minimization is a classical
should be constant, which is the minimum power of the approach to recovering a sparse matrix. It has been
house at any time instant. In practice, however, there widely applied in signal restoration, image denoising,
are small variations in P0 due to inaccurate stand-by and compressive sensing [3, 21]. To the best of our
power specification, thus it is possible that the actual knowledge, however, it has not been explored in the
power could be below the baseline power. context of energy disaggregation.
At time instant t, given the state vector of all appli- Compared with the dictionary based sparse decod-
ances St , the aggregated power reading Xt , (t = 1, 2, . . . , T ), ing [14], our method does not need a training process

4
to get the basis functions, and the recovered matrix in which the concepts of baseline power and active epoch
with our method has an explicit meaning in practice. are illustrated.
In addition, unlike other optimization methods, such as Algorithm 1 shows the pseudo code of detecting active
least square fitting [24], total variation minimization is a epochs.
type of least absolute deviations fitting, which has been
proved to be more robust for various applications [1]. Algorithm 1 Active Epoch Detection
Require: Aggregated power vector X, baseline power
5. PARALLEL LOCAL OPTIMIZATION P0 .
OVER ACTIVE EPOCHS Ensure: Set of Active epochs, W .
1: t = 1, k = 0
In this section, we first analyze the hardness of solving
2: while t ≤ T do
SSER. To solve the problem efficiently, we then propose
3: start = end = t
a Parallel Local Optimization Algorithm (PLOA) by
4: while Xend > P0 and end < T do
splitting the whole timeline into multiple active epochs.
5: end = end + 1
5.1 Hardness of SSER 6: end while
7: if end > start then
There were significant research efforts to solve the 8: k =k+1
total variation minimization problem [2, 10]. Never- 9: Wk = [start, end]
theless, the form of total variation in our case is a 10: end if
discrete version and involves integer variables. Since 11: t = end + 1
(n)
St ∈ {0, 1}, our problem belongs to binary program- 12: end while
ming, which is much harder to solve than the one with 13: return W = {W1 , W2 , · · · , Wk }
real variables.
With T aggregated power readings generated by N
appliances, to obtain the optimal solution via a brute- 5.3 Parallel Local Optimization Algorithm
force method, it can be shown that the computational
complexity is O 2N ·T (see Appendix B), which is ex- Without loss of generality, we take aggregated load
ponential. Furthermore, we can show that solving SSER data of N appliances from time t = 1 to T as an example
is NP-hard (see Appendix A). Since T is very big in to show the major steps of PLOA.
practice, it seems not possible to find an efficient algo- Step 1: Detect all active epochs along the timeline
rithm that outputs the optimal solution. Nevertheless, with Algorithm 1. Denote the set of active epochs as
the active epochs of on/off events suggest that we can W = {W1 , W2 , · · · , Wk }.
perform optimization in a smaller, local time window. Step 2: In the active epoch starting at t with the
length of `, solve the following optimization problem to
5.2 Detection of Active Epochs obtain St:t+` .
min TV(St:t+` Dt:t+` )
Definition 2. An active epoch of a house is defined St:t+`
as a time interval from the time when the aggregated s.t.
power of the house jumps above the baseline power until
Xt:t+` − (St:t+` )T (P + Θ) − (It:t+` − St:t+` )T I ≤ 0,
the time when the aggregated power drops below the
baseline power. (St:t+` )T (P − Θ) + (It:t+` − St:t+` )T I − Xt:t+` ≤ 0,
(12)
where St:t+` is a N -by-` submatrix of S, Dt:t+` is a `-by-
Appliance 1
(N − 1) submatrix of D, It:t+` is a N -by-` submatrix
Appliance 2
of I, and Xt:t+` is a vector containing the aggregated
Appliance 3
power readings of all appliances from time t to time
Aggregated Load
Baseline t + `.
Baseline

On/Off
Step 3: Perform Step 2 on the k active epochs to
Switching Event
Active Epoch Active Epoch obtain a group of k solutions in parallel. Since outside
of active epochs, appliances are considered as stand-by,
Figure 2: A sketch map to illustrate the concepts a complete state matrix S1:T can thus be built.
of active epoch and baseline power using three We can show that the computational complexity to
appliances solve (12) is O(2N ·` ) (see Appendix B). Since `  T as
shown in Fig 1, the problem can be solved efficiently,
Fig. 2 is a sketch map of switching activities and using tools such as CVX 2.0 with a Gurobi engine [5].
power readings of three appliances with constant power,

5
Theorem 1. Assume that the global optimal solution 6. DATA COLLECTION
to SSER in (10) is S ∗ , and the solution obtained from
POLA is Ŝ, if both solutions are unique, then Ŝ = S ∗ . 6.1 Energy Monitoring Platform
Proof. For an arbitrary active epoch starting at t We evaluated our method with real-world trace data
with the length of `, assume that Ŝt:t+` is the unique from our energy monitoring platform. We monitored
local optimal solution obtained via (12). Assume that the appliances’ energy consumption in a typical labo-
the global optimal solution to SSER in (10) is S ∗ . As- ratory and a lounge room in the fifth floor of Engi-
sume that the sub-matrix constructed by the t-th to neering/Computer Science building at the University of
(t + `)-th columns of S ∗ is St:t+`

. We prove the theo- Victoria (UVic).
rem by contradiction. Using an off-the-shelf solution developed by Current
Assume that Cost (http://www.currentcost.com), we recorded the real-

Ŝt:t+` 6= St:t+` . (13) time power of laptops, desktops and some household
appliances. Each appliance’s real power was measured
Then, the following inequality must hold every 6 seconds by the device called Individual Appli-
∗ ance Monitor (IAM), and the measurement results were
St:t+` Dt:t+` ≥ Ŝt:t+` Dt:t+` . (14)
transmitted via wireless to a display server (EnviR),
Therefore, there must exist another global solution which can display and temporarily store the collected
S ∗∗ , in which the j-th column is data. Then, the data in EnviR were sent to our data

∗∗ Ŝj , j ∈ [t, t + `], server. The platform, the monitored appliances, and
Sj = (15)
Sj∗ , j ∈
/ [t, t + `], the measuring devices are shown in Fig. 3.
We collected the data for three months, and one-week
such that
data were used for performance evaluation in Section 7.
S ∗∗ D ≤ S ∗ D. (16)
6.2 Power Splitting
Obviously, (16) is contradictory to the assumption
that S ∗ is the uniquely global optimal solution to SSER. If the power range of appliance A1 largely overlaps
Therefore, the assumption (13) is not true. As a result, with the power range of appliance A2 , given a power
we have value in the overlapping range, it would be hard to de-

cide which appliance is on. To alleviate this problem,
Ŝt:t+` = St:t+` . (17) we should reduce the overlapping power range of two
Outside the active epochs, PLOA treats all appliances different appliances. This is achieved by a power split-
as stand-by, the TV value is 0 in Ŝ. Since the TV value ting method as follows.
cannot be negative, the TV value obtained with PLOA We can find that some simple appliances like a bulb or
is the minimum and must be the same as that obtained a stove, once turned on, usually have stable power read-
with the global optional solution. ings with small fluctuation. In contrast, complex appli-
Overall, if the global optimal solution is unique, for ances such as refrigerator usually have multiple work-
any time instant t, no matter whether t is in an active ing modes, and the power readings at each mode tend
epoch or outside active epochs, the state vector Ŝt ∈ Ŝ to be stable. Therefore, we can split the power con-
must be equal to the state vector St∗ ∈ S ∗ , which means sumption of complex appliances into multiple modes,
Ŝ = S ∗ . each of which is regarded as a virtual appliance. With
such power splitting, the power overlaps among virtual
5.4 Algorithm Analysis appliances can be narrowed down significantly.
Given T aggregated power readings generated by N With readily available appliance information from user’s
appliances that can be broken into k active epochs with manual or public websites such as [8], we can easily split
maximum size w, the computational complexity of the the power range of an appliance. The splitting result in
original SSER problem (10) is O(2N ·T ). With PLOA, our test scenario is given in Table. 2, where the values of
solving the local optimization problem (12) k times re- power deviations are estimated from the collected power
sults in the time complexity upper bounded by O(k · data of each appliance. One may be concerned that
2N ·w ) (see Appendix B ). Considering that the number the estimation of power deviation in practice is inaccu-
of appliances N is constant and w  T , PLOA signifi- rate. With experimental study, however, our method
cantly cuts down the computational complexity. is resilient to inaccurate power deviation estimations as
Obviously, the larger the value of w, the higher the shown in Section 7.
computational complexity. Fortunately, in practice, each After power splitting, the original state vector of an
active epoch in a house is usually not long. As we will appliance with k modes is extended to a state matrix
show in later experiments, PLOA can indeed provide with k rows, each representing the state vector of a vir-
satisfied solutions. tual appliance. In consequence, the sum of states of

6
5 11
1 3 7
8
IAM

EnviR
2 4 6 10

Figure 3: Energy monitoring platform, monitored appliances and measuring devices

in one mode.
Table 2: Results of power splitting for each ap-
pliance
Rated Power Stand-by 7. EXPERIMENTAL EVALUATION
ID Appliance Mode Power Deviation Power
(Watts) (Watts) (Watts) Using real-world trace data collected from our en-
1 LCD-Dell 1 25 5 0 ergy monitoring platform, we evaluate our method by
2 LCD-LG 1 22 5 0 1) comparing its performance with others’, and 2) test-
3 Desktop 1 40 15 3 ing its robustness with inaccurate parameters.
2 50 20
4 Server 1 130 20 10 7.1 Comparison
5 iMac 1 35 5 3
2 50 10 For comparison purpose, we also implement and test
6 Laptop 1 15 5 1 another two methods: the Least Square Estimation based
2 30 10
3 70 10
integer programming method [24] and the iterative Hid-
7 Printer 1 400 50 2 den Markov Model [22]. The former is a signature based
2 700 80 approach, while the latter is a state transition based ap-
3 900 100 proach.
8 Microwave 1 1000 100 2
2 1200 100
3 1700 100 7.1.1 Least Square Estimation Based Integer
Coffee Programming
9 1 700 100 2
Maker
2 900 100 The Least Squire Estimation (LSE) based integer pro-
3 1100 100 gramming method was adopted in [24]. The current
10 Refrigerator 1 115 15 5 waveform of each appliance was extracted and stored
2 350 10 beforehand, and treated as its signature for energy dis-
Water
11
Cooler
1 65 5 3 aggregation. Since it needs extra devices to obtain the
2 380 10 appliances’ current waveform, for a fair comparison, we
3 450 10 use the rated power listed in Table 2 instead of cur-
rent waveform as the appliances’ signatures and imple-
ment the LSE based algorithm as in [24]. To be specific,
multiple virtual appliances split from the same real ap-
with the same notations mentioned in (10) and (18), the
pliance may be larger than one. To avoid this problem,
LSE-based method in our scenario is formally defined
we add an extra integer constraint in our model if the
as:
n-th appliance has k different modes:
min X − S T P

k S 2
(Γn
i )
X
St ≤ 1, (18) k
(Γn
i ) (Γn
i )
X
i=1 s.t. St ∈ {0, 1}, St ≤ 1, (19)
n i=1
where Γ is the set of modes of the n-th appliance and
Γni is the corresponding row number in state matrix 0 ≤ n ≤ N, 0 ≤ t ≤ T.
(Γn )
for mode i. Thus, St i = 1 indicates that at time t By solving the above optimization problem, we can
the n-th appliance is turned on and working in mode get the states (modes) of appliances at each time in-
(Γn )
i; otherwise, St i = 0. The constraint in (18) means stant, and estimate the energy consumption of each ap-
that at any time instant, the appliance can only work pliance using its rated power.

7
7.1.2 Iterative Hidden Markov Model each method. The evaluation metrics are defined as
As a state transition based method, the iterative Hid- follow.
den Markov Model (HMM) was proposed and tested
for energy disaggregation in [22]. We implement this • Energy Disaggregation Accuracy (EDA): It indi-
method in three phases: the modelling phase, the train- cates the accuracy of assigning correct power val-
ing phase, and the inference phase. ues to corresponding appliances.
PN (n) (n)

− Ŝ P

• In the modeling phase, each appliance is modelled n=1 X n
1
as a prior difference HMM, which is defined by EDA := 1 − , (21)
kXk1
λ := {A, B, π}, (20)
where X (n) , Ŝ (n) and Pn represent the true en-
where A is the prior state transition probability ergy consumption vector, the estimated state vec-
distribution, B is the emission probability distri- tor, and the rated power of the n-th appliance,
bution, and π is the starting state distribution of respectively, and X is the aggregated power vec-
the appliance. In particular, 1) A is initialized tor.
with the transition probabilities proportional to
the time spent in each state, and 2) for any change • State Prediction Accuracy (SPA): It indicates the
between states (or modes) Γni and Γnj of the n-th accuracy of estimating the states of appliances.
appliance, its corresponding emission probability PN (n)

− Ŝ (n)

in B is defined by a Gaussian distributed power n=1 S
1
SP A := 1 − , (22)
consumption N (PΓni −PΓnj , Θ2Γn +Θ2Γn ), where PΓni N ·T
i j
and ΘΓni denote the rated power and power devi-
where S (n) , Ŝ (n) represent the true state vector
ation of the i-th appliance under state (or mode)
and the estimated state vector of the n-th appli-
i, respectively.
ance, respectively, and N, T represent the number
• In the training phase, the prior appliance model λ of appliances and the number of samples, respec-
is tuned by running the expectation maximization tively.
(EM) algorithm over the collected load data [22].
• Running time (R.T.) and memory usage (RAM )2 :
The EM algorithm is initialized with the prior state
They indicate the overhead on running time and
transition matrix A and individual appliances’ rated
memory space, respectively.
power in Table 2. It terminates when a local op-
tima in the log likelihood function is found or the
Since the performance of the iterative HMM method
maximum number of iterations (100 in our imple-
depends on model training, we run this method multiple
mentation) is reached.
times over different sizes (w.r.t. number of samples) of
• In the inference phase, the extended Viterbi al- training datasets (denoted as training size). To be spe-
gorithm [22] was applied to infer each appliance’s cific, we changed the training size from 200, increased
states (or modes), considering the constraints of by 200 each time, up to 4000. The average performance
aggregated power and power changes at each time is calculated over all the runs, and the best and the
instant. worst performance is the best and the worst outcomes
among all the runs, respectively.
By running the above three phases iteratively on each The performance of the three methods on energy dis-
appliance, we can get the estimated states as well as aggregation are summarized in Table 3. In addition, as
the power consumption of each appliance at each time illustrated in Fig. 4, we also look into the overall energy
instant. disaggregation accuracy of the three methods, which in-
dicates the energy contribution of each appliance to the
7.2 Performance Evaluation total energy consumption in the whole time period.
To evaluate the error of energy disaggregation, the From the results, we can draw the following conclu-
Disaggregation Error is usually used [15, 22, 9]. Fur- sions.
thermore, since we know appliances’ states (i.e., the
ground truth) in our dataset, we also evaluate the accu- • In term of accuracy, our SSER method performs
racy of recovered appliances’ states via Hamming Loss [7]. much better than the LSE based method and slightly
Accordingly, we use 1 − DisaggregationError and 1 − better than the iterative HMM method in average.
HammingLoss to get the accuracy of energy disaggre- 2
We implemented the three methods with Matlab 8.0 and
gation and the accuracy of state estimation, respec- run them with 32-bit Windows OS with 3.4GHz CPU and
tively. In addition, we also look into the overhead of 4GB RAM.

8
Table 3: Accuracy and overhead of energy disaggregation, using Sparse Switching Event Recovering
(SSER), Least Square Estimation (LSE) based integer programming and iterative Hidden Markov
Model (HMM)
XXX
XXXMetrics Accuracy Overhead
Methods XXX
X EDA SPA Training Size R.T.(second ) RAM (MB )
SSER 61.12% 69.62% – 865.4 596.8
LSE 33.40% 45.67% – 619.3 581.9
HMM (average) 55.27% 67.47% 2116 3721.3 558.6
HMM (best) 67.26% 71.29% 600 1299.7 557.9
HMM (worst) 41.09% 61.27% 3200 7089.6 561.4

2.5
Real Contributions
Overall Energy Consumption / kWh

Estimated Contributions (SSER)


2 Estimated Contributions (LSE)
Estimated Contributions (HMM−average)
Estimated Contributions (HMM−best)
Estimated Contributions (HMM−worst)
1.5

0.5

0
Coffee Maker iMac Desktop Server Water Cooler Laptop Microwave Printer Refrigerator LCD−Dell LCD−LG

Figure 4: Actual and estimated energy contributions of each appliance to the total energy consump-
tion for one-week time period.

• In term of overhead, our SSER method and the easily learned. However, we may not precisely estimate
LSE method are at a comparative level for running the power deviation of an appliance working under a
time and system memory usage. While the mem- certain mode. Due to this consideration, we test the
ory usage of the iterative HMM method is similar performance of our method, assuming that the power
to that of the other two methods, its running time deviations of appliances are not accurate.
is much longer. For this test, we replace Θ with ρ · Θ, so that we can
narrow down or widen up the estimated power devia-
• The performance of the iterative HMM method is tions by regulating ρ. The value of ρ is changed from
subject to the training process and may have a 0.8 to 1.2, causing a parametric error of power deviation
large variation in accuracy and running time. up to 20%.
7.3 Robustness Test
Regarding the iterative HMM method, as shown in Table 4: Accuracy of Energy Disaggregation us-
Table 3, we have found that 1) the gap between the best ing SSER, with inaccurate estimation on power
and the worst outcomes is significant, and 2) there is no deviation
direct relationship between the size of training dataset PP
Metrics
PP EDA SPA
and the estimation accuracy. These indicate that the ρ PP
PP
iterative HMM method is sensitive to parameter esti- 0.8 55.28% 70.37%
mation in the training phase. Consequently, when us- 0.9 60.33% 70.27%
ing this method in practice, it is not easy to estimate 1.0 61.12% 69.62%
appropriate model parameters that can guarantee the 1.1 56.94% 71.15%
performance. This problem is severe especially when 1.2 59.59% 72.24%
the ground truth of energy disaggregation is unknown
and thus it is hard to judge whether or not a trained
model is good enough. Part of the outcomes are shown in Table 4. We can
The above findings motivate us to perform robust- see that the accuracy does not change too much when
ness test on our method. In practice, we have shown in the parameter error varies, indicating that our method
Section 6.2 that the rated power of an appliance can be is robust to parameter estimation.

9
8. FURTHER DISCUSSION: LIMITATION tion (LSE) method and the machine learning method
OF OUR METHOD using iterative Hidden Markov Model (HMM).
While our method is effective and simple, it has the
following limitations. 10. REFERENCES
First, we assumed that the types and the number of [1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and
appliances in a house are known a prior. While this J. Eckstein. Distributed optimization and
assumption is reasonable if our method is used by the statistical learning via the alternating direction
residents, it may not hold for the utility side because method of multipliers. Found. Trends Mach.
the household may be concerned of privacy and thus Learn., 3(1):1–122, Jan. 2011.
unwilling to provide the above information. In this case, [2] A. Chambolle. An algorithm for total variation
our method may not work well; at least it will need minimization and applications. Journal of
the help of other sophisticated methods to detect the Mathematical imaging and vision, 20(1-2):89–97,
appliances used in the house first. 2004.
Second, our method cannot automatically adjust if [3] A. Chambolle and P.-L. Lions. Image recovery via
new appliances are added to or existing appliances are total variation minimization and related problems.
removed from the house. This problem is similar to Numerische Mathematik, 76(2):167–188, 1997.
the first one. We need to assume that the household [4] C. Cost. Products - envir.
is collaborative and should inform the system whenever http://www.currentcost.com/product-envir.html/,
there is a change on the major appliances. Otherwise, 2013. [Online; accessed 15-December-2013].
our method would not work well. [5] CVX. Matlab software for disciplined convex
Finally, we assumed that the set of appliances is sta- programming. http://cvxr.com/cvx/, 2013.
ble during the time in consideration. This may not be [Online; accessed in 23-July-2013].
always true, since a house may include some small ad [6] M. Dong, P. C. Meira, W. Xu, and W. Freitas. An
hoc devices, e.g., mobile phones, which are charged and event window based load monitoring technique for
then unplugged from the power sockets. Fortunately, smart meters. Smart Grid, IEEE Transactions on,
these ad hoc devices may not have perceptible impact 3(2):787–796, 2012.
on the energy disaggregation results in practice, because [7] A. Elisseeff and J. Weston. A kernel method for
their energy consumption is usually not high compared multi-labelled classification. In Advances in neural
to other stable appliances such as refrigerator and stove. information processing systems, pages 681–687,
2001.
9. CONCLUSIONS [8] EPA. A tool of home product finder by energy
Most existing approaches for energy disaggregation star. http://www.energystar.gov/productfinder/,
either require complex appliances’ signatures or use ma- 2013. [Online; accessed in 18-December-2013].
chine learning techniques to train a “good” model. Their [9] M. Figueiredo, B. Ribeiro, and A. M. de Almeida.
effectiveness has been questioned by the lack of commonly- On the regularization parameter selection for
accepted signatures or by the fluctuation in estimation sparse code learning in electrical source
results due to the difficult model training process. It has separation. In Adaptive and Natural Computing
been challenging to develop a simple and broadly appli- Algorithms, pages 277–286. Springer, 2013.
cable method in this important application domain. [10] M. A. Figueiredo, J. B. Dias, J. P. Oliveira, and
In this paper, we proposed a simple, universal model R. D. Nowak. On total variation denoising: A new
for energy disaggregation. We only make use of read- majorization-minimization algorithm and an
ily available information of appliances, e.g., those from experimental comparison with wavalet denoising.
users’ manual, technical specification, or some public In Image Processing, 2006 IEEE International
websites. We built a sparse switching event recover- Conference on, pages 2633–2636. IEEE, 2006.
ing model, based on the sparsity of appliances’ switch- [11] S. Gupta, M. S. Reynolds, and S. N. Patel.
ing events. Furthermore, we used the active epochs of Electrisense: single-point sensing using emi for
switching events to develop a parallel local optimization electrical event detection and classification in the
algorithm to solve our model efficiently. In addition to home. In Proceedings of the 12th ACM
analyzing the complexity and correctness of our algo- international conference on Ubiquitous computing,
rithm, we tested our method with the real-world trace pages 139–148. ACM, 2010.
data from an energy monitoring platform we deployed, [12] G. W. Hart. Nonintrusive appliance load
which records the power readings from a group of house- monitoring. Proceedings of the IEEE,
hold appliances. The test results demonstrated that our 80(12):1870–1891, 1992.
method can achieve better performance than the state- [13] H. Kim, M. Marwah, M. F. Arlitt, G. Lyon, and
of-the-art solutions, including the Least Square Estima- J. Han. Unsupervised disaggregation of low

10
frequency power measurements. In SDM, pages from Sensor Data, pages 34–42. ACM, 2012.
747–758, 2011. [26] M. Zeifman and K. Roth. Nonintrusive appliance
[14] J. Z. Kolter, S. Batra, and A. Ng. Energy load monitoring: Review and outlook. Consumer
disaggregation via discriminative sparse coding. In Electronics, IEEE Transactions on, 57(1):76–84,
Advances in Neural Information Processing 2011.
Systems, pages 1153–1161, 2010.
[15] J. Z. Kolter and M. J. Johnson. Redd: A public
data set for energy disaggregation research. In APPENDIX
proceedings of the SustKDD workshop on Data A. PROOF OF NP-HARDNESS OF SSER
Mining Applications in Sustainability, pages 1–6,
2011.
[16] L. B. N. Laboratory. Standby power summary A.1 Preparation
table. First, we introduce a tree structure T (M, T ), which
http://standby.lbl.gov/summary-table.html, 2014. is a complete M -ary tree with height of T , i.e., every
[Online; accessed in 10-January-2014]. internal node has exactly M children and all leaves have
[17] H. Lam, G. Fung, and W. Lee. A novel method to the same depth of T . By default, the height of the root
construct taxonomy electrical appliances based on is 0. Furthermore, each edge (i, j) of T has a non-
load signatures. Consumer Electronics, IEEE negative cost c(i, j), which will be defined later.
Transactions on, 53(2):653–660, 2007. Assume that the aggregated power readings from time
[18] C. Laughman, K. Lee, R. Cox, S. Shaw, S. Leeb, t = 1 to T are generated by N appliances whose rated
L. Norford, and P. Armstrong. Power signature power and power deviation are known. Given the ini-
analysis. Power and Energy Magazine, IEEE, tial state of all appliances S0 , we can build the following
1(2):56–63, 2003. tree:
[19] S. B. Leeb, S. R. Shaw, and J. L. Kirtley Jr. Step 1. Set S0 as the root of the tree.
Transient event detection in spectral envelope Step 2. For each leaf node Si (or S0 in the first
estimates for nonintrusive load monitoring. Power iteration), set its children as all possible states that can
Delivery, IEEE Transactions on, 10(3):1200–1210, be transited from Si . As a result, we can add M children
1995. to Si , where M = 2N .
[20] L. K. Norford and S. B. Leeb. Non-intrusive Step 3. Set the edge cost between Si and its child
electrical load monitoring in commercial buildings Sj as
based on steady-state and transient load-detection 
algorithms. Energy and Buildings, 24(1):51–64, kSj − Si k1 , if Sj satisfies (9)
c(i, j) =
1996. ∞ , else.
[21] S. Osher, A. Solé, and L. Vese. Image
decomposition and restoration using total Step 4. Repeat Step 2 to Step 4 from t = 1 to T . At
variation minimization and the h 1. Multiscale the end, we construct a tree T (M, T ), where M = 2N .
Modeling and Simulation, 1(3):349–370, 2003. Thus, we can translate SSER into the problem of find-
[22] O. Parson, S. Ghosh, M. Weal, and A. Rogers. ing the minimum-cost path in T (M, T ) from the root to
Non-intrusive load monitoring using prior models a leaf (we call such path a full path in the following). We
of general appliance types. In AAAI, 2012. reduce the optimization problem to its decision version.
[23] S. R. Shaw, S. B. Leeb, L. K. Norford, and R. W.
Cox. Nonintrusive load monitoring and Definition 3. Decision version of SSER (d-SSER):
diagnostics in power systems. Instrumentation Given a constant k, find out whether or not there exists
and Measurement, IEEE Transactions on, a full path in T (M, T ) with total cost no larger than a
57(7):1445–1454, 2008. constant k.
[24] K. Suzuki, S. Inagaki, T. Suzuki, H. Nakamura,
and K. Ito. Nonintrusive appliance load The d-SSER can be re-formulated as
monitoring based on integer programming. In
SICE Annual Conference, 2008, pages 2742–2747. d-SSER = {hT , c, ki :T (M, T ),
IEEE, 2008. c is the cost function ,
[25] Y. Wang, X. Hao, L. Song, C. Wu, Y. Wang,
k ∈ <+ , and
C. Hu, and L. Yu. Tracking states of massive
electrical appliances by lightweight metering and T has a full path with cost ≤ k}.
sequence decoding. In Proceedings of the Sixth
International Workshop on Knowledge Discovery We next reduce a well-known NP-complete problem,
the Traveling Salesman Problem (TSP) to d-SSER. TSP

11
1
can be formulated as
c12 c13
0
TSP = {hG, c , ki :G = (V, E) is a complete graph , 2 3
1
c0 is the cost function , ∞ c23 ∞ c23

c12 c13 1 3 1 2
k ∈ <+ , and ∞ ∞ c13 ∞ ∞ ∞ c12 ∞
2 c23 3
G has a Hamiltonian cycle with cost 2 3 1 2 2 3 1 3

≤ k}.
Figure 5: An example showing the construction
We complete the proof in two steps: firstly we show of T with G
that d-SSER is NP; then, we prove that d-SSER is NP-
complete by showing TSP ≤P d-SSER, i.e., there exists
a reduction from TSP to d-SSER. – (⇒)

A.2 d-SSER is NP G has a Hamiltonian cycle with cost ≤ k.


⇒ there exists a full path in T with cost ≤ k.
• Certificate: A path of T (M, T ).
(Note that there will be no internal node along
• Algorithm: 1) Check that the path is full, i.e., the
path starts from the root and ends at a leaf. 2) the path occurring more than once, otherwise
Calculate the total edge costs along the path and the cost will be infinite based on the rules in Step 4.)
check if it is no larger than k. – (⇐)
• Polynomial Time: We need T + 1 steps to check
the path and obtain its total cost. T has a full path with cost ≤ k.
⇒ there exists a traverse instance in its
A.3 d-SSER is NP-Complete
corresponding graph G with cost ≤ k.
• Firstly, we develop an algorithm F : hG, c0 , ki → (Note that based on the tree construction,
hT , c, ki, i.e., G and c0 in TSP can be transferred
only the full paths starting and ending at the
to T and c in d-SSER as follow:
Step 1. Choose any node of G as the root of T ; same node can have a cost no larger than k,
Step 2. For each leaf node of the current tree, add because other paths have a cost of infinity.)
its children as all the other nodes of G. Since G is ⇒ so G has a Hamiltonian cycle with cost ≤ k.
a complete graph, we can add |V | − 1 children to
each leaf node, where |V | is the number of nodes With above facts, we have proved that d-SSER is
in G. NP-complete. Since SSER problem is no easier than
d-SSER, the former is NP-hard.
Step 3. Repeat Step 2 for |V | times. At the end,
we build T (M, T ), where M = |V |−1 and T = |V |.
B. COMPUTATIONAL COMPLEXITY
Step 4. Set the cost of edge (i, j) in T , c(i, j), as
follows: For the problem in (10), if we want to find the mini-
mum total variation with a brute-fore method, we have
1. Initialization: c(i, j) = c0 (i, j), where c0 (i, j) is
to traverse all possible solutions for state matrix S. On
the edge cost in G.
each time instant, there are 2N possible combinations
2. For each edge (i, j) of T where j is a non-leaf of states for N appliances. Therefore, from time t = 1
node, if j has appeared in the path from the to T , the total number of feasible solutions is up to
root (including the root) to i, i.e., j is an an- T
2N . Thus, the computational complexity of brute-
cestor of i in the tree already, set c(i, j) = ∞.
fore method to problem (10) is O(2N ·T ), which is expo-
3. For each edge (i, j) of T where j is a leaf node, if nential.
j is not the same as the root node, set c(i, j) = As to the optimization problem in (12), the whole
∞. searching space is limited to k small local windows, and
To help understand the construction of T with G, the optimization is confined within the local windows.
Fig. 5 shows an example with three nodes in G. Assume that the longest active epoch has a size of `.
• Secondly, it is easy to see that F takes O(T 2 ) run- The searching space is 2N ·` . Since from t = 1 to T the
ning time. total number of active epochs is k, the total computa-
tional complexity is upper bounded by k · 2N ·` . Consid-
• Thirdly, we show that
ering that ` is usually much smaller than T , the compu-
tational complexity of the original problem is cut down
hG, c0 , ki ∈ TSP ⇔ hT , c, ki ∈ d-SSER. significantly.

12

You might also like