PhDThesis AttilaReiss
PhDThesis AttilaReiss
PhDThesis AttilaReiss
Attila Reiss
D386
Abstract
Regular physical activity is essential to maintain or even improve an individual’s
health. There exist various guidelines on how much individuals should do. There-
fore, it is important to monitor performed physical activities during people’s daily
routine in order to tell how far they meet professional recommendations. This thesis
follows the goal to develop a mobile, personalized physical activity monitoring sys-
tem applicable for everyday life scenarios. From the mentioned recommendations,
this thesis concentrates on monitoring aerobic physical activity. Two main objectives
are defined in this context. On the one hand, the goal is to estimate the intensity of
performed activities: To distinguish activities of light, moderate or vigorous effort.
On the other hand, to give a more detailed description of an individual’s daily rou-
tine, the goal is to recognize basic aerobic activities (such as walk, run or cycle) and
basic postures (lie, sit and stand).
With recent progress in wearable sensing and computing the technological tools
largely exist nowadays to create the envisioned physical activity monitoring system.
Therefore, the focus of this thesis is on the development of new approaches for phys-
ical activity recognition and intensity estimation, which extend the applicability of
such systems. In order to make physical activity monitoring feasible in everyday life
scenarios, the thesis deals with questions such as 1) how to handle a wide range of e.g.
everyday, household or sport activities and 2) how to handle various potential users.
Moreover, this thesis deals with the realistic scenario where either the currently per-
formed activity or the current user is unknown during the development and training
phase of activity monitoring applications. To answer these questions, this thesis pro-
poses and developes novel algorithms, models and evaluation techniques, and per-
forms thorough experiments to prove their validity.
The contributions of this thesis are both of theoretical and of practical value. Ad-
dressing the challenge of creating robust activity monitoring systems for everyday
life the concept of other activities is introduced, various models are proposed and
validated. Another key challenge is that complex activity recognition tasks exceed
the potential of existing classification algorithms. Therefore, this thesis introduces a
confidence-based extension of the well known AdaBoost.M1 algorithm, called Conf-
AdaBoost.M1. Thorough experiments show its significant performance improvement
compared to commonly used boosting methods. A further major theoretical contri-
bution is the introduction and validation of a new general concept for the personal-
ization of physical activity recognition applications, and the development of a novel
algorithm (called Dependent Experts) based on this concept. A major contribution
of practical value is the introduction of a new evaluation technique (called leave-
one-activity-out) to simulate when performing previously unknown activities in a
physical activity monitoring system. Furthermore, the creation and benchmarking
of publicly available physical activity monitoring datasets within this thesis are di-
rectly benefiting the research community. Finally, the thesis deals with issues related
to the implementation of the proposed methods, in order to realize the envisioned
mobile system and integrate it into a full healthcare application for aerobic activity
monitoring and support in daily life.
iii
Acknowledgments
Many have supported, influenced and helped me in the process which ultimately
resulted in this thesis. First, I would like to thank Prof. Dr. Béla Pataki from the Bu-
dapest University of Technology and Economics, whose classes on topics of machine
learning greatly inspired me. My special interest in ensemble learners originates from
this time, which led to arguably the most important contributions of this thesis.
Over the course of my thesis I have submitted papers to various conferences, re-
ceiving a good amount of scientific reviews of my work. Many of these reviews were
quite helpful by providing constructive criticism, which often led to new ideas. There-
fore, I would like to thank all the anonymous reviewers of these conferences. Further-
more, I would like to thank the organizers and participants of the Workshop on Robust
Machine Learning Techniques for Human Activity Recognition held at SMC 2011, which
was a truly inspiring event for me.
For evaluation purposes I mainly used two datasets throughout my thesis, namely
the PAMAP and PAMAP2 datasets. These datasets were recorded from co-workers
and students at DFKI. I would like to thank all the anonymous volunteers partici-
pating in these data recordings – and I am sorry to make you guys iron my shirts
under scientific pretences! Moreover, I would like to thank my students Benjamin
Schenkenberger and Markus Gräb for their help in the development of the physical
activity monitoring system prototypes. Furthermore, I would like to thank Vladimir
Hasko for providing me with various illustrations.
I would like to thank my supervisor, Prof. Dr. Didier Stricker, the opportunity
to carry out my research work. I would also like to thank my other two committee
members, Prof. Dr. Paul Müller for accepting the role of chair of the committee, and
Prof. Dr. Paul Lukowicz for agreeing to be the co-examiner of my thesis.
My very special thanks goes to Dr. Gustaf Hendeby, who supported me in count-
less ways over the course of the thesis. His way of being critical but always construc-
tive and paying attention to the smallest details helped me in different aspects of per-
forming rigorous scientific work. Gustaf, I thank you for our stimulating discussions,
your countless advice, valuable feedback and always taking interest in my work! I
also thank for all the practical help over the years, helping out with my hardware
problems, being my personal LATEX, git, C++, etc. expert and dealing with my annoy-
ing questions, or even providing medical service at midnight if needed. I am also
grateful for you proof-reading my thesis and this way improving its quality. Overall,
I believe that several really good papers show the result of our fruitful cooperation –
and hope to continue this in the future!
During the time of being Ph.D. candidate I was researcher in the Augmented Vi-
sion group at DFKI, Kaiserslautern. I would like to thank many of my former col-
leagues there for all the activities which meant a welcoming distraction from the
hard and stressful work of a scientist, such as bouldering, playing squash, soccer or
billiards, or just enjoying a nice cup of hot chocolate from the fourth floor vending
machine. In particular I would like to thank Leivy Michelly Kaul for helping out
with all the administrative challenges and always having a friendly word for me. I
would also like to thank Christiano Gava, my long-term weekend buddy, whose pres-
v
vi Acknowledgments
ence made all the Saturdays and/or Sundays spent at work less monotonous. My very
special thanks goes to Sarvenaz Salehi, who made the last year I have spent with my
thesis, including the entire writing process, so much more enjoyable. Azizam, I am
also thankful for the great doctoral hat, a truly personal gift with all the memories
from this time!
Last but not least I would like to thank my family. I would like to thank my little
brother Tibor (sorry, Dr. Tibor Reiss) who received his doctoral degree way before
me. This embarrassing fact was highly motivating me to finish my thesis as soon as
possible. Now it’s done – this means no more jokes about it anymore! I would also
like to thank my parents. Without their support during and beyond the thesis none
of this would have been possible. Therefore, I would like to dedicate my thesis to my
parents.
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 The Need of Regular Physical Activity . . . . . . . . . . . . . . 3
1.2.2 The Tools Provided by Wearable Technology . . . . . . . . . . . 4
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Related Work 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Low-Level Activities . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 High-Level Activities . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Activities of Daily Living . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Inertial Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Physiological Sensors . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 Image-based Sensing . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4 Audio-based Sensing . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.5 Object Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.6 Radio-based Sensing . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.7 Combination of Different Types of Sensors . . . . . . . . . . . . 19
2.4 Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Fitness, Sport . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.3 Assisted Living, Elderly Care . . . . . . . . . . . . . . . . . . . 23
2.5.4 Industry: Manufacturing and Services . . . . . . . . . . . . . . 23
2.5.5 Other Application Areas . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
viii Contents
9 Conclusion 147
9.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Bibliography 157
Introduction
1
Regular physical activity is essential to maintain or even improve an individual’s
health. There exist various guidelines on how much individuals should do. Therefore,
there is a need to monitor performed physical activities in order to compare them to
professional recommendations. For a long time, questionnaires about the individual’s
physical activity practice represented the main choice of clinical personnel, resulting
in a highly imprecise control. However, with recent progress in wearable technology,
unobtrusive mobile long-term physical activity monitoring has become reasonable.
The overall goal of this thesis is the development of a physical activity monitoring
system, with two main objectives. On the one hand, the goal is to monitor how far
individuals meet professional recommendations. Concentrating on aerobic activity,
this means the intensity estimation of performed activities: to distinguish activities
of light, moderate and vigorous effort. On the other hand, to give a more detailed
description of an individual’s daily routine, the goal is to recognize basic aerobic
activities and basic postures.
Since the technological tools to create the envisioned physical activity monitoring
system largely exist nowadays, the focus of this thesis is on developing methods to
extend the applicability of such systems. In order to make physical activity monitor-
ing feasible in everyday life scenarios, the thesis deals with questions such as 1) how
to handle a wide range of e.g. everyday, household or sport activities and 2) how to
handle various potential users. Moreover, this thesis deals with the realistic scenario
where either the currently performed activity or the current user is unknown during
the development and training phase of activity monitoring applications. To answer
these questions, this thesis proposes and developes novel algorithms, models and
evaluation techniques, and performs thorough experiments to prove their validity.
This chapter first presents facts on overweight and obesity in Section 1.1, followed
by defining the motivation of work performed within this thesis in Section 1.2. Sec-
tion 1.3 defines the key challenges addressed in this thesis. Section 1.4 briefly de-
scribes the contributions, each presented in the following chapters of this work. Fi-
nally, Section 1.5 gives an outline of the thesis.
1
2 1 Introduction
1.1 Background
According to the World Health Organization (WHO) the number of overweight and
obese people increases rapidly [191]. Increased body mass index (BMI) is a major risk
factor for medical conditions such as diabetes, cardiovascular diseases, musculoskele-
tal disorders and certain types of cancer. This makes overweight and obesity the fifth
leading risk factor for global deaths.
Overweight and obesity is caused by an energy imbalance between calories con-
sumed (through food and beverages) and calories expended (through e.g. physical
activity). The two main factors according to WHO are [191]:
• An increased intake of energy-dense food that are high in fat, salt and sugars
but low in vitamins, minerals and other micronutrients.
Recent studies suggest that the increasing number of overweight and obese peo-
ple is more driven by a reduction in energy expenditure than by a rise in energy
intake. The key facts given by the WHO fact sheet on obesity and overweight are the
following (cf. [191], the fact sheet was last updated in March 2013):
• 65% of the world’s population live in countries where overweight and obesity
kills more people than underweight (the fifth leading risk factor for global
deaths).
• In addition, 44% of the diabetes burden, 23% of the ischaemic heart disease
burden and between 7% and 41% of certain cancer burdens are attributable to
overweight and obesity.
• More than 40 million children under the age of five were overweight in 2010.
• Overweight and obesity are preventable. Beside a balanced diet the engagement
in regular physical activity is a key element in reducing an individual’s BMI.
1.2 Motivation
In response to the above presented facts, regular physical activity is essential. Its
importance has been proven, there exist various guidelines and recommendations on
how much individuals should do. Therefore, this section will argue that there is a
need for monitoring individual’s physical activities in their daily routine. Moreover,
this section will show that with recent progress in wearable sensing and computing
the technological tools exist nowadays to create the envisioned physical activity mon-
itoring systems.
1.2 Motivation 3
• Regular physical activity reduces the risk of many adverse health outcomes.
• For most health outcomes, additional benefits occur as the amount of physical
activity increases through higher intensity, greater frequency and/or longer du-
ration.
• Most health benefits occur with at least 150 minutes of moderate-intensity phys-
ical activity per week, such as brisk walking. Additional benefits occur with
more physical activity.
• Health benefits occur for children and adolescents, young and middle-aged
adults, older adults, and those in every studied racial and ethnic group.
• The health benefits of physical activity occur for people with disabilities.
• The benefits of physical activity far outweigh the possibility of adverse out-
comes.
The progress of wearable computing can be followed when examining the contri-
butions of the yearly IEEE International Symposium on Wearable Computers (ISWC),
the arguably most important conference in this research field. Since the first ISWC
conference (held in October 1997) most of the key topics of this field advanced tremen-
dously, as pointed out by Thomas [175].
On the one hand, wearable sensing advanced in many ways. Nowadays small,
lightweight, low-cost and accurate sensor units are commercially available, support-
ing wireless data transfer, internal data storage, etc. With this progress it becomes
feasible for individuals to wear various sensor units all day. Further miniaturizing
the sensors, integrating them into worn devices (e.g. the concept of a smart watch,
cf. the eZ430-Chronos system from Texas Instruments [44] or the Sony SmartWatch
[163]) and garment integration (the concept of e-textiles, presented e.g. in [26]) will
result in the completely unobtrusive wearing of sensors.
On the other hand, with the appearance of smartphones, the original goals set for
wearable computers were even exceeded [175]. With the smart phone technology a
pervasive control unit is widely available, providing also a large amount of computa-
tion and graphics power to individuals. Different sensors integrated in smartphones
have reached the quality to e.g. monitor the movement of their owners. Moreover, cur-
rent mobile operating systems (e.g. Android) ensure a comfortable way of developing
applications for smartphone-based solutions.
Overall, with the presented advances in wearable sensing and wearable comput-
ing, the technological tools exist to develop a mobile, unobtrusive and accurate phys-
ical activity monitoring system. Therefore, the realization of long-term monitoring
of individuals’ physical activities while performing their daily routine – the goal set
and motivated in the previous subsection – has become feasible.
There are countless number of physical activities (e.g. 605 different activities are
listed in [1]), thus it is not feasible to recognize all of them. Moreover, in practice,
activity monitoring systems usually focus on only a few activities. Nevertheless, all
the other activities should not be completely ignored, but different solutions should
be investigated to deal with them, in order to enhance the applicability of developed
systems. On the one hand, this includes the investigation of how to model these
other activities in classification problems defined for activity monitoring tasks. On
the other hand, proper evaluation techniques should be introduced to deal with this
issue, to simulate the effect of (known or unknown) other activities.
Compared to when only dealing with basic activities and postures, the introduc-
tion of a large number of other activities clearly increases the complexity of the ac-
tivity recognition and intensity estimation tasks. Therefore, it should be investigated
how well existing classification approaches perform on these tasks. In case the de-
sired accuracy can not be reached, novel algorithms should be developed. In order to
evaluate the proposed methods, proper datasets – including a wide range of physical
activities – would be required. However, in the field of physical activity monitoring
there is a lack of such commonly used, standard datasets. Therefore, the thesis will
also address this issue by creating and releasing such datasets, and by benchmarking
various activity monitoring problems.
Another key challenge addressed in this thesis is related to the fact that activ-
ity monitoring systems are usually trained on a large number of subjects, and then
used by a new subject from whom data is not available in the training phase. More-
over, there is a high variability – concerning e.g. age, weight or physical fitness – of
potential users, thus individual accuracy can vary a lot. Therefore, personalization
approaches for activity recognition have become a topic of interest recently. However,
existing solutions have several practical limitations, concerning e.g. computational
time or their applicability for complex classification tasks. The goal of this thesis
is to overcome these limitations by developing a fast and accurate personalization
approach for mobile physical activity recognition applications.
Altogether, the overall goal of this thesis can be refined: To develop a mobile, per-
sonalized physical activity monitoring system applicable for everyday life scenarios.
The next chapters will present the proposed methodology to deal with the here dis-
cussed challenges, and how these proposed algorithms can be realized as part of a
state-of-the-art mobile activity monitoring application. The system created this way
can be used with less constraints under realistic, everyday conditions than systems
presented in previous, related work.
1.4 Contributions
This section briefly describes the contributions presented in each of the following
chapters of this thesis. Moreover, this section provides with the information where
these contributions have been published.
Chapter 3 addresses the lack of a commonly used, standard dataset in the field
of physical activity monitoring. Two new datasets (the PAMAP and the PAMAP2
dataset) are created and both made publicly available for the research community.
1.4 Contributions 7
Attila Reiss and Didier Stricker. Towards global aerobic activity monitor-
ing. In Proceedings of 4th International Conference on Pervasive Tech-
nologies Related to Assistive Environments (PETRA), Crete, Greece, May
2011.
Attila Reiss and Didier Stricker. Creating and benchmarking a new da-
taset for physical activity monitoring. In Proceedings of 5th Workshop
on Affect and Behaviour Related Assistance (ABRA), Crete, Greece, June
2012.
The subsequent chapters heavily rely on both new datasets, using them for the
evaluation of proposed methods. Moreover, there has been a certain impact in the
research community by making these datasets publicly available, cf. Section 3.5.
Chapter 4 addresses the lack of established benchmarking problems in the field
of physical activity monitoring. A benchmark is given, using a complete data process-
ing chain (DPC) and comparing commonly used classification algorithms on a set of
defined physical activity monitoring tasks. The benchmark shows the difficulty of
different classification problems and reveals some of the challenges in this research
field. The description of the benchmark and results are given in [135, 136]. The ap-
plied DPC is first described in [133], an extended description is given in the journal
paper [137],
Attila Reiss, Markus Weber, and Didier Stricker. Exploring and extending
the boundaries of physical activity recognition. In Proceedings of 2011
IEEE International Conference on Systems, Man and Cybernetics (SMC),
Workshop on Robust Machine Learning Techniques for Human Activity
Recognition, pages 46–50, Anchorage, AK, USA, October 2011.
8 1 Introduction
Further results on this matter are shown in the benchmark of [135, 136], providing
evidence that overall subject independent validation techniques should be preferred
for physical activity monitoring.
On the other hand, Chapter 5 focuses on including various other activities in ac-
tivity monitoring classification tasks. Different models are proposed and evaluated,
as described in the conference paper [142],
Attila Reiss, Gustaf Hendeby, and Didier Stricker. Towards robust activity
recognition for everyday life: methods and evaluation. In Proceedings of
7th International Conference on Pervasive Computing Technologies for
Healthcare (PervasiveHealth), Venice, Italy, May 2013.
Chapter 1 Introduction (this chapter) Motivates research work in the field of phys-
ical activity monitoring, defines challenges related to this topic and lists the
contributions of this thesis.
Chapter 3 Datasets for Physical Activity Monitoring Introduces two new datasets,
recorded from a reasonable number of subjects performing a wide range of
physical activities. Both datasets include ground truth and are publicly avail-
able to the research community.
Chapter 4 Data Processing and Classification Presents a data processing chain, de-
scribing the steps feature extraction and classification in more detail. It also
introduces a benchmark, comparing commonly used classification algorithms
on different physical activity monitoring tasks.
Chapter 5 Robust Activity Monitoring for Everyday Life: Methods and Evaluation
Investigates the development and evaluation of robust methods for everyday
life scenarios, with focus on the tasks of aerobic activity recognition and inten-
sity estimation.
Chapter 9 Conclusion Summarizes the thesis, draws conclusions and gives ideas for
possible future extension of the presented research work.
11
12 2 Related Work
2.2 Activities
There exist a wide range of activities that have been monitored and recognized in
related work. Nevertheless, a definite and commonly used categorization of them
is not provided in the literature. Huynh [74] presents a possible way to categorize
activities by grouping them based on duration and complexity. This categorization
defines 3 groups: gestures (brief and distinct body movements), low-level activities
(sequence of movements or a distinct posture) and high-level activities (collection of
activities). Gesture recognition is a large research topic in itself, but is not within the
focus of this thesis. A brief overview of related work performed for low-level and
high-level activity monitoring will be given in following subsections.
There are various classes of activities which can not be clearly grouped into the
above defined 3 categories. A class of activities relevant for this thesis are the ac-
tivities of daily living (ADL). A brief overview of existing literature related to these
activities is presented in Section 2.2.3. Further important topics of activity monitor-
ing research are e.g. fall detection [24, 67, 95, 104, 151], sleep monitoring (wake-sleep
patterns, quality of sleeping) [23, 150, 183], the recognition of workshop or assembly
activities [101, 165, 188], etc. Since these latter topics are outside of the scope of this
thesis, they are not further investigated here.
cising and the user’s daily routine. Another example is presented in [8]: 19 activities
(mainly aerobic sport activities, such as cycling, rowing or exercising on a cross trainer)
are recognized using 5 inertial measurement units. Since this thesis mainly focuses
on the monitoring of low-level activities (cf. Section 2.6 ), mostly on the recognition
of a wide range of activities with multiple sensors, further examples of related work
on this topic will be presented in the remaining chapters.
of elderly and their need for assisted living. The following activities are included in
the set of ADLs: bathing, dressing, toileting, transferring, continence and feeding. More-
over, Lawton and Brody [91] proposed another set of activities, called instrumental
activities of daily living (IADL), in order to assess how well elderly interact with the
physical and social environment. The set of IADLs consists of the following activities:
using telephone, shopping, food preparation, housekeeping, doing laundry, transportation,
taking medications and handling finances.
Various approaches exist for the monitoring and recognition of specific subsets of
ADLs/IADLs. For example, Stikic et al. [166] combine RFID tags and accelerometers
to recognize 10 housekeeping activities (such as dusting, ironing or vacuum cleaning).
With these sensing modalities they combine two main assumptions related to ADLs:
1) the objects people use during the execution of an activity robustly categorize that
activity (RFID tags) and 2) the activity is defined by body movements during its exe-
cution (accelerometers). The authors of [106] use a wrist-worn device to distinguish
15 ADLs. This device includes the following sensors: accelerometer, microphone,
camera, illuminometer and digital compass. Results show that the camera is the most
important single sensor in the recognition of this subset of ADLs, followed by the
accelerometer and the microphone. The work by Maekawa et al. [107] introduces
the concept of mimic sensors: the mimic sensor node has the shape of objects like a
AA battery or an SD memory card, and provides the functions of the original object.
Moreover, these sensor nodes provide additional information (e.g. current flow of the
device), which can be used to detect electrical events. These events can then be used
to recognize ADLs such as shaving or vacuum cleaning. Further recent examples of re-
search work on the topic of monitoring and recognizing ADLs/IADLs are presented
e.g. in [30, 182, 197].
2.3 Sensors
A wide range of sensors have been investigated and used in related work of activity
monitoring, from binary switches to cameras. Generally, two types of sensors can be
distinguished in this field: wearable sensors (placed on the user) and ambient sensors
(placed around the user). It mainly depends on the application what type and how
many sensors should be used. However, the selection of sensors is important when
developing an activity monitoring system, several factors should be considered:
• How intrusive the user experiences the sensor. In case of wearable sensors this
means e.g. how comfortable they are, and in case of ambient sensors if they are
in sight e.g. in a home environment.
• Privacy issues: how much and how sensitive information is recorded and stored
by the sensor (one of the main concerns with sensors such as cameras or micro-
phones).
• Ease of setup, e.g. in case of wearable sensors the user should be able to put on
the sensors without external help and in a short time.
2.3 Sensors 15
• What the sensor measures, thus the most activities should be differentiated with
the provided sensor data.
This section presents the most commonly used sensing modalities applied in the
literature of activity monitoring: inertial sensors, physiological sensors, cameras, au-
dio sensors, sensors deployed in objects and radio-based sensing. Finally, the com-
bination of complementary sensors applied in related work is discussed in the last
subsection.
following vital signs were included in their study: ECG (electrocardiogram), heart
rate, respiratory effort, oxygen saturation, skin resistance and skin temperature. How-
ever, they found that physiological signals did not provide very useful data for activ-
ity recognition. They concluded that physiological signals correlate with the intensity
level of performed activities, but they do not reflect the type of the activity. Moreover,
Pärkkä et al. [117] observed larger interindividual difference in measured vital signs
than e.g. in inertial signals, thus further limiting the applicability of physiological
sensors.
The work presented by Lara et al. [89] comes to a different conclusion than [117]:
The authors state that vital signs are indeed useful to discriminate between certain
activities. They are using the BioHarness BT chest sensor strap [199], which measures
several physiological attributes beside of 3D-acceleration: heart rate, respiration rate,
breath amplitude, skin temperature and ECG amplitude. The authors apply structure
detectors on the physiological signals, and propose two new features for vital signs:
magnitude of change and trend. With these features, the discrimination between
activities during periods of vital sign stabilization can be improved.
This paragraph mentions further examples of using different physiological sen-
sors for activity monitoring. Respiration rate is measured and applied for the as-
sessment of physical activities e.g. in [98, 112]. Features extracted from GSR signal
(galvanic skin response) are good indicators to identify the presence of mental stress,
even when the user performs different activities [170]. Haapalainen et al. [64] address
a similar problem, the real-time assessment of cognitive load while the user is active,
relying mainly on features extracted from GSR and ECG data. Finally, Chang et al.
[29] use a non-contact portable heart rate monitor to predict driver drowsiness.
The recognition of activities and gestures using external cameras has been the focus of
extensive research. A survey on the recognition of human actions and activities from
image data is given in [125]. However, using cameras for activity monitoring has
several major issues. First of all, although video is very informative, automatically
recognizing activities from video data is a complex task. A solution to this issue
is to extract certain features from images, related to e.g. the location of the user or
which objects in the environment are used. Such approaches could enable real-time
recognition of various ADLs. An example is presented by Duong et al. [40]. Multiple
cameras are installed in a room, observing a person performing different activities.
The room is divided into squared regions, some of these including objects of interest
(e.g. a stove). The multi-camera system tracks the person, returning a list of visited
regions. This list can then be used to recognize actions such as using the stove.
A second major issue of camera-based activity monitoring systems is their lack
of pervasiveness. The cameras are usually installed indoors, individuals have to stay
within the field of view of the imaging sensors, defining a very strong limitation of
the applicability of such systems. A way to overcome this limitation is to use wear-
able cameras instead of static ones, which is feasible due to recent miniaturization
concerning hardware components. An example of this wearable vision concept is
2.3 Sensors 17
presented in [172], where a camera is worn on the shoulder of the user, observing
the interaction with various objects. Moreover, the wrist-worn device presented by
Maekawa et al. [106] to recognize different ADLs, also includes a small camera com-
plementing the other sensors. From this camera, a colour histogram is computed in
each image. This information is used to determine how well a colour performs to
distinguish a certain ADL class from other ADL classes.
A further major issue of using video sequences for human activity monitoring is
privacy: Most individuals would have severe concerns about being permanently mon-
itored and recorded by cameras. Wearable imaging sensors could at least partially
solve this issue by simply enabling the user to turn off the sensor when monitoring
and recording is undesired. Another way to ensure the privacy of the user is when a
certain abstraction of raw image data is directly performed on the sensor device, and
only this abstracted information is stored or used for further processing.
Despite the above described major issues and the fact that using cameras is an
expensive way of monitoring activities, they can be used as a source of additional
information to improve the performance of e.g. a wearable system. Bahle et al. [14]
investigates this concept by using vision-based devices in the user’s environment in
an opportunistic way to improve wearable activity recognition. In case video data is
available (e.g. the user is passing through a space observed by a camera), body motion
information derived from the video signal is correlated with on-body sensor informa-
tion. The goal is to improve the on-body system by e.g. determining the location of
the sensors on the user’s body.
activities (e.g. shopping or outside dining) are distinguished. During these activities,
the sensor node including the microphone was hung in front of the user’s chest. The
authors propose the use of Haar-like sound features and an HMM classifier, and claim
an average recognition accuracy of nearly 97%.
Apart from using audio sensing for human activity monitoring, Rossi et al. [146]
propose to apply audio data for context recognition. They discriminate a wide range
of daily life situations, defined by objects (e.g. coffee machine, shaver), locations (e.g.
office, restaurant) or animals and persons (e.g. dog, speech), all producing characteris-
tic sound. In order to model these 23 sound context categories, they use crowdsourc-
ing, thus the large amount of openly available audio samples annotated by various
web users.
Although the above described approaches achieve promising results, human activ-
ity monitoring based on audio sensing has several limitations. First of all, the above
mentioned results were mostly achieved under laboratory conditions. Under realistic
settings, due to background noise, the performance of such systems is significantly
lower, as pointed out e.g. in [197]. Moreover, while certain activities produce charac-
teristic sound (e.g. typically ADLs), many other activities (e.g. different ambulation
or sport activities) have no specific audio pattern. Nevertheless, using audio data
in combination with other types of sensors is beneficial, as discussed in Section 2.3.7.
Finally, similar to image-based sensing, privacy is a major issue when using audio sen-
sors. A solution could be to compute sound-features directly on the wearable device,
as proposed by [106].
aesthetics of objects in the home. A solution to overcome these issues is to apply the
mimic sensors proposed in [106] (cf. Section 2.2.3). Another issue of activity monitor-
ing based on object use is that this concept is restricted to mainly home settings, but
is not feasible otherwise. Moreover, activities not requiring object interaction can not
be dealt with relying only on this concept. Therefore, to recognize all kind of activi-
ties of individuals’ daily routine (including also e.g. locomotion or sport activities), a
combination with other types of sensors is required, as suggested in Section 2.3.7.
location (sub-areas, such as kitchen or living room, are defined in a smart home) is
used to improve the recognition rate of a system which uses RFID tags to recognize
ADLs performed by elderly. They investigate two different ways to include the loca-
tion information into their existing system: by introducing a new feature based on the
location information, and by using location information to filter out irrelevant sensor
readings.
The combination of inertial data and RFID tags is presented by Stikic et al. [166]
to recognize 10 housekeeping activities. The combination of inertial sensors and mi-
crophones is used by Lukowicz et al. [101] to recognize workshop activities. Finally,
the combination of inertial and physiological sensors has been used successfully for
physical activity monitoring. For example, Crouter et al. [36] show that combining
acceleration and heart rate data improves on the intensity estimation of performed
activities compared to when only using inertial data. Moreover, the combination of
accelerometers and a heart rate monitor will be used throughout this thesis for phys-
ical activity recognition and intensity estimation.
As a final note of this section, it should be noted that integrating different types
of sensors into one device is clearly beneficial over deploying them separately. This
statement is especially true for long-term wearable sensing, where the user’s comfort
should be taken into account. For this reason, modern smartphones are clearly inter-
esting for activity monitoring applications, especially if sensors are required where
the orientation and location of the device in respect to the user’s body is not crucial.
Most state-of-the-art smartphones provide with a long list of sensors: camera, GPS,
accelerometer, gyroscope, microphone, compass, ambient light sensor, proximity sen-
sor, etc. A survey on mobile phone sensing can be found e.g. in [88].
• Decision tree classifiers, including custom decision trees [15, 43, 117] and vari-
ous automatically generated decision tree algorithms (C4.5, ID3) [15, 61, 117].
• Bayesian classifiers, e.g. the Naive Bayes classifier [15, 61, 100].
• Instance based classifiers, such as the k-nearest neighbors (kNN) method [61,
108, 197].
2.5 Applications 21
• Markov models, including hidden Markov models (HMM) [68, 78, 182, 200]
and conditional random fields (CRF) [171, 180].
2.5 Applications
This section discusses application areas of human activity monitoring. Related work
presents different forms of activity recognition, and shows that these are broadly ap-
plicable. Lockhart et al. [99] give a survey on mobile activity recognition applications.
They argue that little practical work has been done in the area of applications in mo-
bile devices so far. Moreover, they define three major types of applications: those
that benefit end users, those that benefit developers and third parties, and those that
benefit crowds and groups. However, these types of applications are not mutually
exclusive. Therefore, the following subsections will give a list and short description
directly of different application areas.
and applications utilize a broader range of sensors and provide with more detailed
information, e.g. about the intensity and duration of performed physical activities,
stairs climbed, etc. An example commercial product is the Fitbit system [50], which
is a small chip-on device containing a 3D motion sensor, and provides the above
described functionality. Concerning research performed in this area, there exist a
large amount of related work on assessing the intensity (e.g. in [118, 187]) or recog-
nizing the type of performed physical activities (e.g. in [28, 43]), or both [137, 173].
Moreover, it was also shown that by detecting the type of performed activities, the
estimation of energy expenditure can be improved [2, 22].
Apart from applications monitoring an individual’s physical activities in general,
there exist work on monitoring specific sport activities. For example, Strohrmann
et al. [168] investigate the potential of wearable sensors to derive kinematic features
in running. With two miniature inertial measurement units, attached to the athlete’s
foot and hip, the authors could distinguish between experienced and unexperienced
runners. Another example of monitoring a specific sport is given by Bächlin et al. [13],
who analyze ski-jumping from on-body acceleration data. With sensors attached to
the athlete’s legs, arms and chest, the authors could identify characteristic motion pat-
terns and extract biomechanically descriptive parameters. Furthermore, Ladha et al.
[87] present a climbing performance analysis system. They capture a climber’s move-
ments through an accelerometer-based wearable sensing platform, automatically de-
tect climbing sessions and moves, and assess parameters related to core climbing
skills: power, control, stability and speed.
2.5.2 Healthcare
For example, the long-term monitoring of a patient’s daily life can be used to detect
changes or unusual patterns that could indicate early symptoms of diseases such as
Alzheimer’s or Parkinson’s disease. These symptoms might not even occur during
short medical appointments. Therefore, integrating results of activity monitoring
into out-of-hospital services is of importance. Lau et al. [90] investigate how ac-
tivity recognition with a smartphone can support patient monitoring and improve
telemedicine services.
Finally, a mobile healthcare application called BeWell [27] is shortly described
here. It shows how mobile activity monitoring can be integrated into our daily life,
and promote multiple aspects of physical and emotional well-being. The BeWell ap-
plication continuously tracks user behaviour along three distinct health dimensions
without requiring any user input. It automatically infers the user’s sleep duration
(based on phone usage), physical activity (based on the phone’s accelerometer, dis-
tinguishing between the activity classes walking, running and stationary) and social
interaction (based on ambient speech during a day and the usage of social applica-
tions on the smartphone). For all three components a score between 0 and 100 is
computed. Using these scores, persuasive feedback is given to the user in form of
an animated aquatic ecosystem, rendered as an ambient display on the smartphone’s
home screen [27].
ered in the following fields: aircraft maintenance, car production, healthcare, and
emergency response. The goal was to use wearable technology and activity recogni-
tion to provide a summary of performed activities, to provide hands-free access to
e.g. electronic manuals, or to assist in the training of new workers.
Concrete examples of the monitoring and support of manufacturing tasks are
given e.g. in [165, 188]. Ward et al. [188] show an approach of continuous activity
recognition using on-body sensing. They combine data from wearable microphones
and accelerometers, and recognize a set of workshop activities such as sawing, ham-
mering or drilling. Stiefmeier et al. [165] present a system for tracking workers in
car manufacturing plants, investigating two scenarios. In the scenario of an assem-
bly task (the installation of the front lamp) both wearable and environmental sensors
are used. This scenario is not feasible in production due to the instrumentation of
the cars, thus is restricted to training environments. The second scenario investigates
the quality check in the manufacturing process and relies only on wearable sensors
integrated into a jacket.
Examples of using human activity recognition for supporting workers in the ser-
vice sector, concretely in hospital environments are given e.g. in [7, 47]. Altakouri
et al. [7] investigate to what degree automatic activity recognition could support the
use of prioritized lists for mobile phone-based nursing documentation. They show
that the activity recognition-based list selection improves both the system’s usability
and acceptance, considering parameters such as time effort, interaction complexity, er-
ror rate and subjective system perception. Finally, Favela et al. [47] demonstrate that
mobile activity recognition systems can build pervasive, context- and activity-aware
networks for the monitoring of hospital staff, thus providing important information
for colleagues.
is to encourage physical activity and to modify the user’s daily behaviour (e.g. taking
the stairs instead of the elevator).
In the field of robotics, social robots require to detect and track humans and rec-
ognize their activities [167]. This is a key aspect to effectively integrate robots into
people’s workflows, and to natural human-robot interaction in a variety of scenarios.
An example of a military application is presented by Minnen et al. [111]. They use
activity recognition for the automatic generation of post-patrol reports, thus to sum-
marize what happened during a patrol of several hours. Finally, targeted or context-
aware advertising is an evolving application area. An example is given by Partridge
and Begole [121], who display ads that are relevant to the user, based on the user’s
current or frequent activities.
2.6 Conclusion
In this chapter a general overview of recent, state-of-the-art research related to hu-
man activity monitoring has been presented. A wide range of technologies, methods
and solutions have been highlighted for the different components and aspects of such
systems. Four major topics have been discussed in this chapter, namely the type of
monitored activities, the type of applied sensing modalities, different machine learn-
ing methods, and finally various application areas of activity monitoring systems.
The rest of this thesis will only focus on a fraction of the here presented approaches,
which is specified in this section.
From the wide range of activities monitored and recognized in related work, this
thesis focuses on low-level activities. The term physical or aerobic activities will be
used throughout this work, referring both to basic locomotion activities (e.g. walking,
running or cycling) and further everyday, household and fitness activities. Moreover,
the stationary activities (or inactivities) lying, sitting and standing are included, since
in many scenarios distinguishing activity and inactivity is important. Overall, the
goal is to describe most of an individual’s daily routine from the physical activity
point of view. The concrete list of included activities will be given in the respective
chapters. Generally, this depends on the used dataset, as described in Chapter 3.
Due to the defined list of activities, the usage of ambient sensors is not feasible in
this thesis. Therefore, only wearable sensors are considered hereafter. As discussed
in Section 4.1, the combination of accelerometers and a heart rate monitor will be
used throughout this work. As pointed out in Section 2.3, these types of sensors are
complementary to each other. Therefore, as suggested by various related work, the
combination of them will be beneficial for recognizing the type and estimating the
intensity of performed physical activities.
Considering machine learning methods, Section 4.2 will present a complete data
processing chain for physical activity monitoring. This also includes an analysis and
evaluation of a wide range of classification methods, focusing thereby only on super-
vised approaches. Moreover, an important topic of this thesis are meta-level classi-
fiers, especially various boosting algorithms, as presented in Chapter 6.
Finally, as discussed in Section 1.2, the main motivation for developing different
methods in this thesis is to be used in healthcare applications. With the precise moni-
26 2 Related Work
toring of physical activities, the here presented solutions can tell how far individuals
follow general or custom recommendations. However, the proposed approaches can
also be directly used in general fitness applications, where detailed information about
the intensity and duration of performed physical activities is of interest for the user.
Datasets for Physical Activity Monitoring
3
3.1 Introduction
Most established research fields are characterized amongst others with publicly avail-
able, standard, benchmarked datasets. Such datasets have many benefits: different
and new approaches can be compared to each other, no research time has to be spent
on laborious data collection, standardized testbeds can be created, etc. In the field
of physical activity monitoring ideally datasets reflect natural behaviour, they are
recorded from many different subjects performing a wide range of activities, and are
fully annotated with ground truth. Unfortunately, due to various difficulties concern-
ing hardware and annotation (all discussed below in this chapter) and due to privacy
issues, only a few datasets are publicly available. Moreover, even these few data-
sets show significant limitations, thus there is a lack of a commonly used, standard
dataset. Therefore, this chapter presents two new datasets for physical activity mon-
itoring, both made publicly available for the research community. Moreover, these
datasets are used for benchmarking in Chapter 4, showing the difficulty of common
classification problems and exposing some challenges in this research field.
27
28 3 Datasets for Physical Activity Monitoring
room equipped with a kitchen. This dataset uses numerous sensors attached to the
body of the participants and in the environment, and contains over 25 hours of sen-
sor data. The TUM Kitchen dataset [174] was created and made publicly available
for research in the areas of markerless human motion capture, motion segmentation
and human activity recognition. The dataset provides video data from 4 fixed cam-
eras, RFID (radio-frequency identification) tag and reed switch readings and action
labels. Finally, Baños et al. [10] presented a benchmark dataset with the specific goal
to evaluate sensor displacement in activity recognition. The dataset includes 33 fit-
ness activities, recorded using 9 inertial sensor units from 17 subjects.
The goals of physical (aerobic) activity monitoring are to estimate the intensity of
performed activities and to recognize activities like sitting, walking, running or cycling.
The focus and challenges in this field are – compared to activity recognition in e.g.
ADL or industrial scenarios – different, due to differing conditions (considering e.g.
the sensor setup: only a few, wearable sensors can be used). Since the characteristic of
the activities in this field also significantly differ from the specific activities of home
or industrial settings, different approaches are required, e.g. features are calculated
usually on longer intervals, etc.
Therefore, datasets specifically created for physical activity monitoring are nec-
essary. However, only a few, limited datasets are publicly available in this research
field. The DLR dataset [53] contains 4.5 hours of annotated data from 7 activities per-
formed by 16 subjects, wearing one belt-mounted inertial measurement unit (IMU).
Bao and Intille [15] present a data recording of 20 different activities with 20 subjects,
wearing five 2-axis accelerometers, and show results in activity recognition with 4
different classifiers. The Opportunity dataset contains 4 basic modes of locomotion:
lying, sitting, standing and walking [103, 144]. Finally, in the dataset introduced by
Xue and Jin [195] a protocol of 10 different activities was followed by 44 subjects,
wearing one 3-axis accelerometer.
Data recording for physical activity monitoring faces some difficulties compared to
data collection in e.g. home environments, resulting in less comprehensive and estab-
lished datasets. For instance, a robust hardware setup consisting of only wearable
sensors is required. The reason is that activities such as running are highly stressing
the setup. Moreover, parallel video recording for the purpose of offline annotation – a
widely used method in other fields, such as the monitoring of daily activities in home
environments – is not feasible if outdoor activities are included in the data collection.
Therefore, only online annotation of the performed activities is possible for creating
a reliable ground truth. As a result, there is a lack of a commonly used, standard
dataset and established benchmarking problems for physical activity monitoring.
This chapter introduces two new datasets for physical activity monitoring, both
made publicly available. Based on the conditions and limitations of the public da-
tasets described above, the following criteria were defined for the creation of the
datasets: a wide range of everyday, household and fitness activities should be per-
formed by an adequate number of subjects, wearing a few 3D-IMUs and a heart rate
3.2 The PAMAP Dataset 29
(HR) monitor. The reason for requiring a HR-monitor in addition to the commonly
used inertial sensors is that physiological sensing – missing in other public datasets
– is especially useful for the intensity estimation of physical activities. For example,
inertial sensing alone can not reliably distinguish activities with similar movement
characteristic but different energy expenditure, e.g. walking and ascending stairs, or
an even more difficult example: walking and walking with a load.
A further requirement for the new datasets is that the participating subjects should
have the freedom to execute activities however they want. It has been pointed out in
previous work (e.g. in [15]) that (semi-)naturalistic data collection provides a more
realistic training and test data, and permits greater subject variability in behaviour
than data recorded in a laboratory setting. For example, subjects should be allowed
to freely walk in- or outdoors during data capture, instead of specifying locomotion
activities on e.g. a treadmill.
The rest of this chapter is organized in the following way: Section 3.2 introduces
the PAMAP dataset and Section 3.3 presents the PAMAP2 dataset. The PAMAP data-
set contains data from 14 activities and 8 subjects, wearing 3 IMUs and a HR-monitor.
The PAMAP2 dataset was recorded with a similar sensor setup from 9 subjects, per-
forming 18 different physical activities. For both datasets, the hardware setup, the
data collection protocol, etc. will be described in detail. Moreover, lessons learnt
from these data recordings are discussed in Section 3.4. Finally, the chapter con-
cludes in Section 3.5, reflecting also on the impact of making these datasets publicly
available.
Figure 3.1: PAMAP dataset: placement of IMUs (red dots) and the data collection
unit (blue rectangle).
3-axis accelerometer is used from an IMU, which has a resolution of 0.038 ms−2 in the
range ±16g. Of the 3 IMUs, one was attached above the wrist of the dominant arm,
one on the chest of the test subjects, and one sensor was foot-mounted.
A Sony Vaio VGN-UX390N UMPC was used as inertial data collection unit, car-
ried by the subjects in a pocket fixed on their belt. The placement of the sensors and
this data collection unit is shown in Figure 3.1. The IMUs were connected to the Sony
Vaio UMPC by USB-cables, which were taped to the body so that they did not restrict
normal movements of the subjects. To obtain heart rate information, the Garmin
Forerunner 305, a GPS-enabled sports watch with integrated HR-monitor, was used.
The applied sensors (3 IMUs and a HR-monitor) define 3 positions on a subject’s
body, since the chest IMU and the HR-monitor are both placed at the same position.
Previous work in e.g. [122] showed that in the trade-off between classification per-
formance and number of sensors, using 3 sensor locations is the most effective. In
systems for physical activity monitoring the number of sensor placements should be
kept at a minimum, for reasons of practicability and comfort – since users of such
systems usually wear them for many hours a day. On the other hand, a thorough ana-
lysis of sensor positions in Section 8.2 shows that less than 3 sensor positions are not
sufficient for accurate activity recognition.
During data collection, a supervisor accompanied the test subjects and marked
the beginning and end of each of the different activities. These timestamped activity
labels were stored on the data collection unit. Synchronization of the timestamped
inertial data, annotations and heart rate data was carried out offline. The data format
used in the published dataset is given in Appendix B, Table B.1.
3.2 The PAMAP Dataset 31
Table 3.1: PAMAP dataset: protocol of data collection. Left side: indoor activi-
ties, right side: outdoor activities.
Duration Duration
Activity Activity
[min] [min]
Lie 3 Walk very slow 3
Sit 3 Break 1
Stand 3 Normal walk 3
Iron 3 Break 1
Break 1 Nordic walk 3
Vacuum clean 3 Break 1
Break 1 Run 3
Ascend stairs 1 Break 2
Break 2 Cycle 3
Descend stairs 1 Break 1
Break 1 Run 2
Ascend stairs 1 Normal walk 2
Descend stairs 1 Break 2
Play soccer 3
Break 2
Rope jump 2
3.2.2 Subjects
Eight subjects participated in the data collection, seven males and one female. The
subjects were employees at a research institute, aged 27.88 ±2.17 years, and had a
BMI of 23.68 ±4.13 kgm−2 . One subject was left-handed, all the others were right-
handed. Detailed information about each of the test subjects is given in Appendix B,
Table B.3.
fined activities in the way most suitable for the subject. Therefore, a semi-naturalistic
data collection was carried out when recording the PAMAP dataset, following the
specifications defined in Section 3.1.2. A brief description of each of the activities
can be found in Appendix B, Table B.5.
One of the goals of physical activity monitoring is to estimate the intensity of
performed activities. A HR-monitor is included in the hardware setup of the data
collection, this way heart rate related features can be considered for this task during
data processing (cf. Chapter 4). Therefore, a short break is inserted in the data col-
lection protocol after most of the activities. The duration of the breaks were chosen
so that the heart rate of the subjects was allowed to return to the “normal” range
after performing an activity. The goal was to ensure that the measured heart rate
was unaffected by the previous activities. For this purpose, a 1-minute break was
sufficient after most of the activities, except for the most exhausting ones (ascending
stairs, running and playing soccer), after which activities a 2-minutes break was in-
serted. However, since in everyday situations the influence of activities on the next
performed ones can not be excluded, this influence was also simulated in the data
collection protocol: descending stairs was performed directly after ascending stairs and
normal walking directly after running (cf. Table 3.1).
Figure 3.2: PAMAP2 dataset, GUI used for the data collection: start screen of
the labeling tool. This screenshot is made while the subject is performing the
activity sit during data collection. All sensors are operating correctly according
to the green symbols in the top left corner. Moreover, the subject’s heart rate is
63 beats per minute in the moment of this screenshot, as indicated in the top left
corner as well.
Figure 3.3: PAMAP2 dataset, GUI used for the data collection: labeling of various
everyday activities.
Figure 3.3. This labeling tool offers the possibility to label the basic activities and
postures on its start screen, as shown in Figure 3.2. Moreover, on various other pages,
the labeling of a wide range of everyday, household and sport activities is possible
as well. Figure 3.3 shows the labeling screen for various everyday activities as an
example, while a subject is performing the activity ascend stairs. In addition, symbols
in the top left corner of the GUI (cf. both Figure 3.2 and Figure 3.3) indicate whether
the three IMUs and the HR-monitor are continuously operating, thus whether the
data acquisition is running smoothly.
During data collection, the beginning and end of each of the different performed
activities were marked with the described labeling tool on the Viliv UMPC, thus pro-
viding timestamped activity labels along with the raw sensory data. The collection
of all raw sensory data and the labeling were implemented in separate threads in an
application running on the collection unit, to ease the synchronization of all collected
data. The data format of the PAMAP2 dataset is given in Appendix B, Table B.2.
3.3.2 Subjects
In total nine subjects participated in the data collection, eight males and one female.
The subjects were mainly employees or students at a research center, aged 27.22 ±3.31
years, and having a BMI of 25.11 ±2.62 kgm−2 . One subject was left-handed, all the
others were right-handed. Detailed information on each of the test subjects is given
in Appendix B, Table B.4.
range of other activities should be included. Each of the subjects had to follow this
protocol, performing all defined activities in the way most suitable for them.
Furthermore, a list of optional activities to perform was also suggested to the sub-
jects. The idea of these optional activities was to further enrich the range of activities
in the recorded dataset. Activities from this optional list were only performed by
some of the subjects if the circumstances made it possible, e.g. if the subject had ad-
ditional free time after completing the protocol, if there was equipment available to
be able to perform an optional activity, and if the hardware setup made further data
recording possible. In total, 6 different optional activities were performed by some of
the subjects: watching TV, computer work, car driving, folding laundry, house cleaning
and playing soccer.
The created PAMAP2 dataset therefore contains in total data from 18 different ac-
tivities. A brief description of each of these activities can be found in Appendix B, Ta-
ble B.6. Most of the activities from the protocol were performed over approximately
3 minutes, except ascending/descending stairs (due to limitations of the building where
the indoor activities were carried out) and rope jumping (to avoid exhaustion of the
subjects). Breaks between activities in the protocol were inserted for the same reason
as explained in Section 3.2.3 for the PAMAP dataset. The optional activities were
performed as long as the subjects wished, or as long as it took to finish a task (e.g.
arriving with the car at home or completely finishing dusting a bookshelf).
data capturing. On the other hand, attaching the sensors and the custom bag for
recording the PAMAP2 dataset was straightforward, the entire setup time was not
more than 5 minutes. All subjects reported that the sensor fixations were comfortable
and did not restrict normal movements at all. Only the custom bag felt sometimes
uncomfortable during intensive movements (e.g. running). A smaller solution for the
collection unit – using e.g. a smartphone – would be recommendable for similar data
collections.
One aspect, which should not be underestimated, is the weather. Opposed to
most of the datasets collected in the research field of activity recognition (recorded
e.g. in home or industrial settings), a significant part of the two datasets presented in
this chapter had to be recorded outdoors. Since most of the subjects preferred not to
run or cycle in too hot, cold or rainy conditions, and the entire data collection took
several days, careful planning and consulting the weather forecast was required when
making the schedule for the subjects.
Problems occurring during such complex and long data recordings are inevitable.
The setup belonging to the PAMAP dataset had several weaknesses, due to the wired
connection of the IMUs. Overall, this caused a significant amount of data loss, mo-
tivating the improvement of the system. As for the setup belonging to the PAMAP2
dataset (using the improved prototype, as described above), there were two main rea-
sons for data loss. The first reason is data dropping caused by glitches in the wireless
data transfer. However, this was not too significant: the 3 IMUs had a real sampling
frequency (a calculated sampling rate corrected with overall data dropping occur-
rence) of 99.63 Hz, 99.78 Hz and 99.65 Hz on the hand, chest and ankle placements,
respectively (compared to the nominal sampling frequency of 100 Hz). Data loss on
the wireless HR-monitor appeared even more rarely, and is also less critical than on
the IMUs.
The second, more severe reason for data loss in the PAMAP2 recording was the
somewhat fragile system setup due to the additionally required hardware compo-
nents: 2 USB-dongles, a USB-hub and a USB extension cable were added to the collec-
tion unit in the custom bag. Especially during activities like running or rope jumping
the system was exposed to a lot of mechanical stress. This sometimes caused losing
connection to the sensors, or even a system crash, when the data recording had to
be restarted – and in a few cases the data collection could not be recovered even this
way. As a result, some activities for certain subjects are partly or completely missing
in the dataset. To try to minimize such problems, it is preferable to use the entire
sensor setup from one company (so that no second dongle is needed), or even better
would be using sensors with standard wireless communication (although the Trivisio
sensors use the 2.4 GHz ISM band, they use a specific communication protocol, and
thus a USB-dongle is needed for wireless data streaming). As an alternative, local
storage on the sensors should be considered for future data collection, made possible
by new sensor solutions recently appearing on the market.
3.5 Conclusion 37
3.5 Conclusion
In the field of physical activity monitoring there is a lack of a commonly used, stan-
dard dataset and established benchmarking problems. Therefore, this chapter pre-
sented two new datasets (the PAMAP and the PAMAP2 dataset), both made publicly
available. The PAMAP dataset was recorded on 14 physical activities with 8 subjects,
wearing 3 IMUs and a HR-monitor. The PAMAP2 dataset was recorded with a similar
sensor setup from 9 subjects, performing up to 18 different activities. In the respec-
tive sections of this chapter the hardware setup, participating subjects and the data
collection protocol have been described in detail for both datasets.
Since the introduced new datasets provide a wide range of physical activities, per-
formed by a reasonable number of subjects, challenging classification problems can
be defined. This is shown in Chapter 4 where e.g. different intensity estimation and
activity recognition classification tasks, defined on the PAMAP2 dataset, are bench-
marked. The so exposed challenges motivate the improvement of existing data pro-
cessing and classification approaches, resulting in e.g. a new classification algorithm
presented in Chapter 6. Moreover, the introduced rich datasets allow the evaluation
of everyday life scenarios, and the development of robust techniques and personal-
ization approaches for physical activity monitoring, as shown in Chapter 5 and Chap-
ter 7, respectively.
Apart from using the two introduced datasets in this work, there has been a
certain impact in the research community by making them publicly available. The
PAMAP dataset was published in October 2011 [139], while the PAMAP2 dataset
was published in June 2012 [135, 136]. Moreover, the PAMAP2 dataset was included
in the UCI repository in August 2012 [132]. Despite the relatively short time passed
since publishing the datasets (this chapter is written in August 2013), several research
groups have already made use of them, and state that releasing the datasets is a great
service to the research community. Moreover, the number of page hits of the PAMAP2
dataset in the UCI repository [132] passed 11500 (last accessed on 2013-08-29).
Concluding the chapter, this paragraph briefly presents a few major publications
from other research fields, which also make use of the PAMAP or PAMAP2 data-
set. Rakthanmanon et al. [130] use the PAMAP dataset to demonstrate their multi-
dimensional time series clustering algorithm. The PAMAP dataset is used by Hu
et al. [71] as one of the examples for real-world problems to evaluate a novel time
series classification algorithm under more realistic assumptions. Moreover, Rakthan-
manon and Keogh [129] use the PAMAP dataset to evaluate their proposed time series
shapelet discovery algorithm, and to demonstrate that shapelets can also be used as
a high accuracy classification tool for activity recognition. Huang and Schneider [73]
propose spectral learning algorithms for hidden Markov models (HMMs) that incor-
porate static data, and use the PAMAP2 dataset in their experiments to demonstrate
the performance of the new algorithms on real (not synthetic) data. Finally, Clifton
et al. [32] introduce an extreme function theory for novelty detection, and illustrate
their proposed method on the PAMAP2 dataset, used as a benchmark time-series da-
taset.
Data Processing and Classification
4
4.1 Introduction
The goals of this thesis in physical activity monitoring are to estimate the intensity
of performed activities and to identify basic or recommended activities and postures.
These goals are motivated by various health recommendations, as discussed in Chap-
ter 1. For these purposes two datasets have been created: the PAMAP and PAMAP2
datasets, both presented in Chapter 3. These datasets will be used in this chapter to
apply different data processing methods and classification algorithms, and to create
a benchmark of physical activity classification problems.
The created PAMAP and PAMAP2 datasets provide raw sensory data from 3 in-
ertial measurement units and a heart rate monitor. Previous work shows that for
different tasks in physical activity monitoring accelerometers outperform other sen-
sors. Concerning intensity estimation of physical activities, work in e.g. [118, 173]
show that 3D-accelerometers are the most powerful sensors. In [118] for example,
accelerometers and gyroscopes attached to wrist, hip and ankle were used to estimate
the intensity of physical activity, and it was found that accelerometers outperform
gyroscopes. Concerning the recognition of physical activities, a study carried out by
Pärkkä et al. [117] analyzed the effect of various sensors. In this study subjects carried
a set of wearable sensors (3D-accelerometer, 3D-compass, microphone, temperature
sensor, heart rate monitor, etc.) while performing different activities. According to
the results, accelerometers proved to be the most information-rich and most accurate
sensors for activity recognition. Pärkkä et al. [117] found that accelerometer signals
react fast to activity changes and they reflect the type of activity well. Therefore, from
all 3 IMUs, only data from the accelerometers is used hereafter.
In addition to acceleration data, the heart rate signals provided by both datasets
will also be used. Physiological signals were closely examined in previous work for
intensity estimation. For example, Crouter et al. [36] conclude that combining ac-
celerometer and heart rate data, or using only heart rate information enables good in-
tensity estimation. On the other hand, Tapia et al. [173] show that introducing heart
39
40 4 Data Processing and Classification
40
30
20
2
m/s
10
−10
0 1 2 3 4 5 6 7
4
x 10
180
160
140
bpm
120
100
80
0 1 2 3 4 5 6 7
4
x 10
20
15
Activity ID
10
0
0 1 2 3 4 5 6 7
4
Lie Sit Stand Iron Vacuum x 10
Ascend Descend
clean stairs stairs
120
100
80
2
60
m/s
40
20
−20
0 1 2 3 4 5 6 7 8 9
4
x 10
200
180
160
bpm
140
120
100
0 1 2 3 4 5 6 7 8 9
4
x 10
25
20
Activity ID
15
10
0
0 1 2 3 4 5 6 7 8 9
4
Normal walk Nordic walk Cycle x 10
Run Rope jump
Figure 4.1: Example raw acceleration (chest IMU up-down direction, shown in
the top row of both plots) and heart rate data (shown in the middle row of both
plots) from the PAMAP2 dataset (activity IDs as given in the dataset are shown
in the bottom row of both plots). The top and bottom plots together show one
subject’s collected data while performing the 12 activities of the defined data
collection protocol. Note: data from the break intervals and transient activities
has been removed from these plots.
4.2 Data Processing Chain 41
Figure 4.2: The data processing chain. From raw sensory data synchronized,
timestamped and labeled 3D-acceleration and heart rate data is obtained during
the preprocessing step (P). This data is segmented in the segmentation step (S)
with a sliding window, using a window size of 512 samples. A total of 137 dif-
ferent features are extracted in the next, the feature extraction step (F). These
features serve as input to the classifiers, which output the estimated intensity
level class and the recognized activity class.
4.2.1 Preprocessing
The previously introduced datasets provide timestamped raw sensory data from the
3 IMUs and the heart rate monitor, and timestamped activity labels. All this data is
synchronized in the preprocessing step. After this step synchronized, timestamped
and labeled acceleration (as justified above, only data from the accelerometers is used
from all IMUs) and heart rate data is available.
To deal with wireless data loss (thus handling missing values when applying
data processing and classification techniques), Saar-Tsechansky and Provost [148] pro-
posed different methods. Linear interpolation was selected from these approaches for
simplicity reasons. Further processing of the raw signals (e.g. filtering) is included in
the extraction of various features, as described in Section 4.2.3. Finally, to avoid deal-
ing with potential transient activities, 10 seconds from the beginning and the end,
respectively, of each labeled activity is deleted.
4.2.2 Segmentation
Previous work shows (e.g. [75]) that for segmentation there is no single best win-
dow length for all activities. To obtain at least two or three periods of all different
periodic movements, a window length of about 3 to 5 seconds is reasonable. For
example, experiments presented by Lara et al. [89] showed best results with a win-
dow size of 5 seconds when using acceleration data for physical activity recognition.
Therefore, and to assure effective discrete Fourier transform (DFT) computation for
the frequency domain features2 , a window size of 512 samples was selected. Since
the sampling rate of the raw sensory data was 100 Hz in both the PAMAP and the
PAMAP2 datasets, the segmentation step results in signal windows of 5.12 seconds
length. Therefore, the preprocessed data is segmented using a sliding window with
the defined 5.12 seconds of window size, shifted by 1 second between consecutive
windows.
2 Compare e.g. the commonly used Cooley-Tukey fast Fourier transform (FFT) algorithm [34], which
recursively breaks down a DFT of a discrete signal of length N into smaller DFTs. This procedure is the
most computationally effective when N is a power of 2.
44 4 Data Processing and Classification
N
X −1
I N Tabs (x) = |xi |. (4.4)
i=0
Finally, correlation between one pair of axes is defined as follows, assuming the
two discrete-time sequences are denoted as x and y:
cov(x, y)
rxy = . (4.5)
σx σy
4.2 Data Processing Chain 45
For the frequency-domain features first the power spectral density is computed:
X0 , X1 , . . . , XN −1 , where Xk refers to the kth element of the PSD. The feature energy is
defined then as following:
N
X −1
Energy = Xk2 . (4.6)
k=0
The feature spectral entropy is defined as:
N
X −1
Entropy = − Xk log Xk . (4.7)
k=0
Finally, the power ratio of two frequency bands which consist of the first p and q
elements of the PSD, respectively, is defined as follows:
P p−1
k=0 Xk
Power_ratiop,q = P q−1 . (4.9)
k=0 Xk
The signal features extracted from the 3D-acceleration data are computed for each
axis separately, and for the 3 axes together, too. This results in 108 (= 9 types of fea-
tures ×3 sensors ×4, since all 3 axes and their combination is calculated) plus 9 (for the
feature correlation between each pair of axes, calculated for all 3 sensors) extracted
features. Moreover, since synchronized data from the 3 IMUs is available, combining
sensors of different placements is possible. From the above mentioned features (and
calculated on 3 axes of each of the IMUs) mean, standard deviation, absolute integral
and energy are pairwise (e.g. arm plus chest sensor placement) weighted accumu-
lated. Furthermore, a weighted sum of all the 3 sensors together is also added3 . This
combination of different sensors results in 16 additionally extracted features. From
these derived features it is expected that they would better describe and distinguish
activities with e.g. both upper and lower body movement, thus improving the recogni-
tion of activities involving movements of multiple body parts. Moreover, considering
especially the features containing all 3 sensor placements, these features could im-
prove the intensity estimation of activities. Overall, 133 (= 108 + 9 + 16) features are
extracted from the segmented IMU acceleration data.
From the heart rate data, the features (normalized) mean frequency and the fre-
quency gradient are calculated. Normalization is done on the interval defined by
resting and maximum HR. The resting HR of a test subject is extracted from the 3
minutes lying task in the data collection protocol (cf. Section 3.2.3 and Section 3.3.3
for the PAMAP and PAMAP2 datasets, respectively), and is defined as the lowest HR
3 The weights were selected heuristically, and are set to 0.5, 0.2 and 0.3 for the chest, arm and
foot/ankle sensor locations, respectively. The goal was to receive more meaningful features compared
to when simply accumulating the feature values from the different sensor placements.
46 4 Data Processing and Classification
sit
walk
Nordic walk
cycle
ascend stairs
run
1.4
1.2
1
Mean of normalized HR
0.8
0.6
0.4
0.2
0
14
12
160
10
140
8 120
100
6
80
4 60
40
2
20
0 0
Standard deviation of y−axis on chest IMU
Peak absolute value of y−axis on arm IMU
Figure 4.3: Data processing: example visualization of the feature space. The 6 se-
lected physical activities can mostly be distinguished with the 3 chosen features
(computed from acceleration and heart rate data).
value measured over this period. As for the maximum HR (MHR), the subject’s age-
predicted MHR (M HR = 220 − ag e) is used [173]. The feature gradient, both on the
raw and normalized heart rate signal, is defined as the difference between the first
and last element of a window segment:
grad(x) = xN −1 − x0 . (4.10)
Overall, 4 features are extracted from the segmented heart rate data. Therefore, in
total 137 (= 133 +4) features are derived from each of the 5.12 seconds long signal seg-
ments, yielding a large feature vector. Different techniques have been applied in the
field of physical activity monitoring to reduce the feature space, e.g. principal com-
ponent analysis (PCA) [8, 192] or Sammon’s mapping [38]. Moreover, various feature
selection methods were applied in other previous work. For example, distribution
bar graphs were created in [117], a heuristic greedy forward search was performed
in [187] and a forward-backward sequential search algorithm was applied in [124] to
select the best features. However, in this thesis, no feature selection or reduction of
the feature space is applied on the extracted feature set. The reason is that the focus
of this thesis is on the classification step of the DPC, thus all features will be used for
each of the classifiers presented hereafter.
Figure 4.3 visualizes a part of the feature space: samples of 6 selected activities
are shown using 3 selected features. The purpose of this plot is to show that differen-
tiating between the performed physical activities is feasible with the set of extracted
features. For example, due to larger arm movements, samples of walk and Nordic walk
can mostly be separated with e.g. the feature peak absolute value, computed on the
4.2 Data Processing Chain 47
4.2.4 Classification
The extracted features serve as input for the next processing step, the classification. In
the field of physical activity monitoring research, especially activity recognition, dif-
ferent classification approaches exist and yielded good results. The benefit of using
the data processing chain of Figure 4.2 is amongst others its modularity. This allows
to easily remove any module and replace it with a different approach, thus different
classifiers can easily be tested and compared to each other. This subsection presents
various classification algorithms which are commonly used in related work. Prelimi-
nary studies are carried out, using data provided by the PAMAP dataset. Based on the
results of these studies, several classifiers are selected for the benchmarking process,
as described in Section 4.4.2.
In this thesis, both intensity estimation and activity recognition are regarded as
classification problems. For the preliminary studies of this subsection, one classifi-
cation task is defined for each, which are justified and described in more detail in
Section 4.4.1. For the intensity estimation task 3 classes are defined: The goal is to
distinguish activities of light, moderate and vigorous effort. From the 14 physical
activities included in the PAMAP dataset lying, sitting, standing, ironing and very
slow walking are regarded as activities of light effort; vacuum cleaning, descending
stairs, normal walking, Nordic walking and cycling as activities of moderate effort;
ascending stairs, running, playing soccer and rope jumping as activities of vigorous
effort. For the activity recognition task 7 classes are defined: lying, sitting/standing
(forming one class), normal walking, Nordic walking, running, cycling and other.
The latter class includes all remaining activities from the PAMAP dataset4 : ironing,
vacuum cleaning, ascending and descending stairs, playing soccer and rope jumping.
For the training and evaluation of all classifiers within the preliminary studies
of this subsection (except of the custom decision tree classifier), the Weka toolkit is
used [65]. Weka (Waikato Environment for Knowledge Analysis) is a free machine
learning software written in Java. It provides tools for analyzing and understanding
data, including the implementation of a large amount of data mining algorithms, and
a graphical user interface for easy data manipulation and visualization. A great de-
scription of the Weka toolkit can be found in [193], along with a thorough grounding
in the machine learning concepts the toolkit uses, and practical advice for using the
different tools and algorithms.
4 Except of very slow walking since this activity was only included in the dataset for the intensity
estimation task, in order to have walking related activities in all 3 intensity classes.
48 4 Data Processing and Classification
Figure 4.4: Structure of the custom decision tree classifier, created for the activity
recognition task (PAMAP dataset).
The various classification schemes and results of the preliminary studies are pre-
sented in the following paragraphs.
Base-level classifiers
Base-level classifiers have been widely used for activity monitoring classification tasks.
For example, automatically generated decision trees are applied in [15, 17, 61, 117], k-
4.2 Data Processing Chain 49
Table 4.2: Accuracy on the activity recognition task with 4 different base-level
classifiers, applied on the PAMAP dataset.
Classifier Accuracy [%]
C4.5 85.03
kNN 87.62
SVM 62.31
Naive Bayes 74.14
nearest neighbors (kNN) in [61, 108], support vector machines (SVM) in [39], Naive
Bayes classifiers in [61, 100], or artificial neural networks (ANN) in [43, 63, 117].
A comparison of base-level classifiers for activity recognition can be found e.g. in
[8, 122]. From the different classification approaches, C4.5 decision tree, kNN, SVM
and Naive Bayes classifiers are tested in a preliminary study performed on the activity
recognition task, defined above on the PAMAP dataset. Each of these 4 classification
methods were named as one of the top 10 data mining algorithms, identified by the
IEEE International Conference on Data Mining (ICDM) in December 2006 [194].
A detailed introduction into decision tree classification can be found in [147].
C4.5 is a widely used algorithm to generate decision tree classifiers and is imple-
mented in the Weka toolkit. A practical description of choices and settable param-
eters of this algorithm is given in [193].
The k-nearest neighbor algorithm (originally proposed by Fix and Hodges [51])
belongs to the instance-based learning methods. In kNN, a new feature vector is
classified based on the k closest training examples in the feature space.
Support vector machine classifiers select a small number of critical boundary in-
stances called support vectors from each class, and build a linear discriminant function
that separates them as widely as possible [193]. SVM is a useful and popular classi-
fication technique not only because it constructs a maximum margin separator, but
also because – by using different kernel functions – it is possible to form nonlinear
decision boundaries with it. The Weka toolkit uses the libsvm library, practical advice
for using this tool can be found in [70].
Finally, the Naive Bayes classifier is a simple probabilistic classifier, probably the
most common Bayesian network model used in machine learning. The model as-
sumes that the features are conditionally independent of each other, given the class.
The Naive Bayes model works surprisingly well in practice, even when the conditional
independence assumption is not true. A great overview on probabilistic learning,
Bayesian classifiers and learning Bayesian models is given in[147].
The preliminary study with these 4 classification methods was carried out with
the Weka toolkit [65]. For evaluation, leave-one-subject-out 8-fold cross-validation
protocol was used (more details on this subject independent evaluation technique
can be found in Section 4.3.2). The accuracy (performance measures are described
in more detail in Section 4.3.1) of the different base-level classifiers is shown in Ta-
ble 4.2, overall good results were achieved. However, the results also indicate a pos-
sible further improvement on classification accuracy, since even the best result was
only 87.62%. Therefore, there is a reasonable demand for developing more complex
50 4 Data Processing and Classification
and more advanced classifiers to obtain better results in activity monitoring classifi-
cation tasks.
Meta-level classifiers
The use of meta-level classifiers for physical activity monitoring problems (cf. e.g.
[205]) is not as widespread as using various base-level classifiers. However, the com-
parison of base-level and meta-level classifiers on different activity recognition clas-
sification tasks in [131] showed that meta-level classifiers (such as boosting, bagging,
plurality voting, etc.) outperform base-level classifiers, thus applying them is of inter-
est. Detailed information on ensemble learning (meta-level classifiers) can be found
e.g. in [147, 193].
From the various meta-learning algorithms provided by the Weka toolkit, two
ensemble learning methods are selected and evaluated in this chapter: bagging and
boosting. The idea behind these methods is to iteratively learn weak classifiers by ma-
nipulating the training dataset, and then combining the weak classifiers into a final
strong classifier. To briefly describe bagging and boosting, assume that the training
dataset contains N instances: (x i , yi ) i = 1, . . . , N (x i is the feature vector, yi is the an-
notated class of the instance: yi ∈ 1, . . . , C), and t = 1, . . . , T iterations are performed
with the weak classifier f (x). In bagging, N instances are randomly sampled with re-
placement in each t iteration from the instances of the training dataset. The learning
algorithm (the f (x) weak classifier) is applied on this sample, the resulting model is
stored. After learning, when classifying a new instance, a class is predicted with each
of the T stored models. The final decision is the class that has been predicted most
often.
The difference in boosting is that the training dataset is reweighted after each
iteration, and the single learning models are also weighted for constructing the final
strong classifier. This way, the weak learners built in the subsequent iterations focus
on classifying the difficult instances correctly. Moreover, when constructing the final
classifier, more influence is given to the more successful models. There exist many
variants based on the idea of boosting, cf. Chapter 6 for a thorough description of
them. One of the most widely used variants is AdaBoost, which is also implemented
in the Weka toolkit. Moreover, AdaBoost is identified as one of the top 10 data mining
algorithms by Wu et al. [194], thus it will be used by the further experiments of this
chapter.
Both bagging and boosting use the same learning algorithm (the same type of
weak classifier, e.g. a decision tree classifier) in each iteration, and combine these
T models into the final strong classifier. In a preliminary study all 4 above tested
base-level classifiers (C4.5 decision tree, kNN, SVM and Naive Bayes) are evaluated
as learning algorithms for both bagging and boosting, using the Weka toolkit. On
the kNN classifier (which performed best in the experiments performed above, cf.
Table 4.2) no improvement was observed applying neither boosting nor bagging. This
observation is in accordance with the results of [131]. On the other hand, boosting
and bagging the C4.5 classifier resulted in a significant improvement of classification
accuracy. Moreover, from all the base-level and meta-level classifiers tested within
this subsection, best results were achieved with boosted decision trees.
4.2 Data Processing Chain 51
Table 4.3: Confusion matrix on the intensity estimation task, performed on the
PAMAP dataset. The results were achieved with an AdaBoost C4.5 classifier. The
table shows how the intensity class of different annotated samples is estimated
in [%].
Annotated Estimated intensity
intensity light moderate vigorous
light 96.49 3.15 0
moderate 5.24 89.74 5.02
vigorous 0 2.32 97.68
Table 4.4: Confusion matrix on the activity recognition task, performed on the
PAMAP dataset. The results were achieved with an AdaBoost C4.5 classifier. The
table shows how different annotated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6 0
1 lie 100 0 0 0 0 0 0
2 sit/stand 0 82.75 0.56 0 0 0.26 16.42
3 walk 0 0 87.05 0.42 3.54 0 8.99
4 Nordic walk 0 0 1.16 79.22 5.14 0 14.48
5 run 0 0 0 0 96.71 0 3.29
6 cycle 0 3.10 0 0 0 82.94 13.96
0 other 0 4.37 0.04 0.04 0.13 0.08 95.35
This paragraph gives more detailed results from the preliminary experiments,
achieved with the best performing classifier (AdaBoost with C4.5 decision trees) on
the PAMAP dataset. Table 4.3 shows the confusion matrix on the intensity estima-
tion task. The overall accuracy using leave-one-subject-out 8-fold cross-validation
is 94.37% with the boosted decision tree classifier. It is worth mentioning that mis-
classifications only appear into “neighbour” intensity classes, thus no samples an-
notated as light intensity were classified into the vigorous intensity class, and vice
versa. Table 4.4 shows the confusion matrix on the activity recognition task, defined
on the PAMAP dataset. The overall accuracy using leave-one-subject-out 8-fold cross-
validation is 90.65% with the boosted decision tree classifier. Most of the misclas-
sifications can be explained with the introduction of the other, background activity
class: The characteristics of some of the other activities overlap with some of the ba-
sic activity classes to be recognized. For example, the activities standing and ironing
or running and playing soccer have similar characteristics. The problem of dealing
with background activities for activity recognition is further analyzed in Chapter 5.
There, methods will be proposed and evaluated in order to develop robust activity
monitoring systems for everyday life.
52 4 Data Processing and Classification
C
1 X Pi,i
precision = (4.11a)
C Ri
i=1
C
1 X Pi,i
recall = . (4.11b)
C Si
i=1
It should be noted that, from the above defined metrics, accuracy only consid-
ers the total number of samples. As for the other metrics, class imbalance is taken
into account: normalization is done using the total number of samples for each class
separately. This different behaviour of the performance measures is important since
fewer samples from some activities in a dataset are not necessarily due to lesser im-
portance of these activities, but could be caused by e.g. a more difficult data capture
of these activities. For example, the created PAMAP and PAMAP2 datasets are also
characterized by being imbalanced, thus certain activities occur more frequently than
others (cf. Chapter 3). Some results in Section 4.4.3 will also point out the difference
between the performance metrics, and how these results should be interpreted.
Section 5.1.2 will reflect on the differences between subject dependent and inde-
pendent evaluation, concluding that for physical activity monitoring systems usually
subject independent validation techniques should be applied. However, in order to
create a widely used and comparable benchmark, both subject dependent and subject
independent evaluation is carried out. As for the subject dependent evaluation, stan-
dard 9-fold cross-validation is applied in the benchmark (k = 9 was chosen to have
the same number of folds as for the subject independent evaluation).
classes are defined for this problem: activities of light, moderate and vigorous effort.
Obtaining ground truth for this task is less straightforward than for activity recogni-
tion tasks, hence requires a short explanation.
In various previous works on estimating intensity of physical activity (e.g. in
[35, 118]), reference data was collected with a portable cardiopulmonary system (e.g.
Cortex Metamax 3B or Cosmed K4b2 ). This method has the advantage that it pro-
vides precise measurements on an individual’s oxygen consumption. It makes mea-
sured metabolic equivalents (METs) available, which is essential if the task is to use
these values to e.g. estimate metabolic equivalent from other features [35, 118]. How-
ever, in this thesis the goal is to only estimate whether a performed activity is of light,
moderate or vigorous effort, since for the physical activity recommendations only
this information is needed (cf. Section 1.2). Therefore, it is sufficient to use the Com-
pendium of Physical Activities [1] to obtain reference data for the defined intensity
estimation task. This compendium contains MET levels assigned to 605 activities. It
was e.g. used in the recommendations given by Haskell et al. [66] to provide example
activities of moderate and vigorous intensities. Moreover, the compendium was used
for validation of MET estimation in related work, e.g. in [109].
The ground truth for the defined rough intensity estimation task is thus based
on the metabolic equivalent of the different activities, provided by Ainsworth et al.
[1]. Therefore, the 3 classes are defined as following using the set of activities from
the PAMAP2 dataset: lying, sitting, standing and ironing are regarded as activities
of light effort (< 3.0 METs); vacuum cleaning, descending stairs, normal walking,
Nordic walking and cycling as activities of moderate effort (3.0-6.0 METs); ascending
stairs, running and rope jumping as activities of vigorous effort (> 6.0 METs).
4. Naive Bayes
- Scheme:weka.classifiers.bayes.NaiveBayes
5. kNN
- KNN = 7 (number of neighbours)
- Scheme:weka.classifiers.lazy.IBk -K 7 -W 0
Table 4.6: Benchmark on the PAMAP2 dataset: performance measures on the ‘Intensity estimation task’.
Standard 9-fold cross-validation LOSO 9-fold cross-validation
Classifier Precision Recall F-measure Accuracy Precision Recall F-measure Accuracy
C4.5 decision tree 0.9796 0.9783 0.9789 0.9823 0.9490 0.9364 0.9426 0.9526
Boosted C4.5 0.9989 0.9983 0.9986 0.9988 0.9472 0.9564 0.9518 0.9587
Bagging C4.5 0.9853 0.9809 0.9831 0.9866 0.9591 0.9372 0.9480 0.9552
Naive Bayes 0.9157 0.8553 0.8845 0.9310 0.8986 0.8526 0.8750 0.9251
kNN 0.9985 0.9987 0.9986 0.9982 0.9488 0.9724 0.9604 0.9666
Table 4.7: Benchmark on the PAMAP2 dataset: performance measures on the ‘Basic activity recognition task’.
Standard 9-fold cross-validation LOSO 9-fold cross-validation
Classifier Precision Recall F-measure Accuracy Precision Recall F-measure Accuracy
C4.5 decision tree 0.9968 0.9968 0.9968 0.9970 0.9349 0.9454 0.9401 0.9447
Boosted C4.5 0.9997 0.9994 0.9995 0.9995 0.9764 0.9825 0.9794 0.9785
Bagging C4.5 0.9971 0.9968 0.9970 0.9971 0.9346 0.9439 0.9392 0.9433
Naive Bayes 0.9899 0.9943 0.9921 0.9923 0.9670 0.9737 0.9703 0.9705
kNN 1.0000 1.0000 1.0000 1.0000 0.9955 0.9922 0.9938 0.9932
58
4.4 Benchmark of Physical Activity Monitoring
Table 4.8: Benchmark on the PAMAP2 dataset: performance measures on the ‘Background activity recognition task’.
Standard 9-fold cross-validation LOSO 9-fold cross-validation
Classifier Precision Recall F-measure Accuracy Precision Recall F-measure Accuracy
C4.5 decision tree 0.9784 0.9701 0.9743 0.9709 0.8905 0.8635 0.8768 0.8722
Boosted C4.5 0.9991 0.9979 0.9985 0.9980 0.9559 0.9310 0.9433 0.9377
Bagging C4.5 0.9881 0.9766 0.9823 0.9787 0.9160 0.8937 0.9047 0.9042
Naive Bayes 0.8905 0.9314 0.9105 0.8508 0.8818 0.8931 0.8874 0.8308
kNN 0.9982 0.9966 0.9974 0.9957 0.9428 0.9458 0.9443 0.9264
Table 4.9: Benchmark on the PAMAP2 dataset: performance measures on the ‘All activity recognition task’.
Standard 9-fold cross-validation LOSO 9-fold cross-validation
Classifier Precision Recall F-measure Accuracy Precision Recall F-measure Accuracy
C4.5 decision tree 0.9554 0.9563 0.9558 0.9546 0.8376 0.8226 0.8300 0.8244
Boosted C4.5 0.9974 0.9973 0.9974 0.9969 0.8908 0.8947 0.8928 0.8796
Bagging C4.5 0.9660 0.9674 0.9667 0.9666 0.8625 0.8489 0.8556 0.8554
Naive Bayes 0.9419 0.9519 0.9469 0.9438 0.8172 0.8561 0.8362 0.8365
kNN 0.9946 0.9937 0.9942 0.9925 0.9123 0.9097 0.9110 0.8924
59
60 4 Data Processing and Classification
background activities overlap with some of the basic activity classes to be recognized.
This issue will be further investigated in Chapter 5.
Altogether, good performance is achieved on all 4 classification tasks: approxi-
mately 90% or more with the best performing classifiers. However, there are two
important challenges defined by the benchmark, where more advanced approaches
in future work should improve the performance. On the one hand, by increasing the
number of activities to be recognized – while keeping the same sensor set – the diffi-
culty of the task exceeds the potential of standard methods. This not only applies for
the task ‘all’, but for the ‘background’ task as well: By introducing an other activity
class for all the background activities, the complexity of the classification problem sig-
nificantly increases, thus the performance drops using the same standard approaches.
On the other hand, when comparing classification performance individually for the
9 subjects, a high variance can be observed. This strongly increases with the increase
of task complexity: The individual performance on the ‘basic’ task (using the boosted
decision tree classifier) varies between 93.99% and 100%, while on the ‘all’ task it
varies between 74.02% and 100%. Therefore, especially on the more difficult clas-
sification problems, personalization approaches (subject dependent training) could
significantly improve compared to the results of the benchmark.
4.5 Conclusion
This chapter presented data processing methods and classification algorithms for
physical activity monitoring. A data processing chain is defined including prepro-
cessing, segmentation, feature extraction and classification steps. For the first three
steps common approaches are used in this thesis. For the classification step, different
algorithms are introduced and compared. First preliminary studies are carried out
with a wide range of classifiers using the PAMAP dataset. Moreover, a benchmark
is given in this chapter by applying 5 selected classifiers on 4 defined classification
tasks.
The presented results mainly serve to characterize the difficulty of the different
tasks. The benchmark reveals some challenges in physical activity monitoring, which
will be addressed in the next chapters. For example it shows that complex activity
recognition tasks exceed the potential of existing approaches. This motivates the in-
troduction of new classification algorithms, as presented in Chapter 6. Moreover, the
large variance of individual classification performance motivates novel personaliza-
tion approaches, as discussed in Chapter 7.
The definition and benchmark of classification problems including the 6 optional
activities from the PAMAP2 dataset remains for future work. Furthermore, it should
be noted that a post-processing step is not included in the DPC as defined in this
chapter. Therefore, no temporal information is taken into account when classifying
activities. The reason is that when following a protocol during data collection, there
is no practical meaning how different activities follow each other. However, in real-
life situations patterns in the order of performed activities exist: For example driving
car is usually preceded and followed by walking and not e.g. by sitting or especially
not e.g. by ironing clothes. To simulate this, datasets recorded directly from subjects’
4.5 Conclusion 61
everyday life have to be created. Then, methods such as HMMs can be applied for
determining the transition between different types of physical activities. However,
this problem exceeds the purpose of this thesis.
Robust Activity Monitoring for Everyday
5
Life: Methods and Evaluation
5.1 Introduction
In literature, the monitoring of physical activities under realistic, everyday life con-
ditions – thus while an individual follows his regular daily routine – is usually ne-
glected or even completely ignored. Therefore, this chapter investigates the develop-
ment and evaluation of robust methods for everyday life scenarios, with focus on the
tasks of aerobic activity recognition and intensity estimation. Two important aspects
of robustness are investigated: dealing with various (unknown) other activities and
subject independency, both explained in more detail in the next subsections. Methods
to handle these issues are proposed and compared. The usage of activity monitoring
applications in common everyday scenarios is thoroughly evaluated in simulations.
Moreover, a new evaluation technique is introduced (leave-one-activity-out, LOAO)
to simulate when an activity monitoring system is used while performing a previ-
ously unknown activity. Through applying the proposed methods it is possible to
design a robust physical activity monitoring system with the desired generalization
characteristic.
The outline of this chapter is the following: the current section describes the prob-
lem statement related to the other activities and subject independence. Section 5.2
defines the basic conditions (classification problems, data processing and classifica-
tion methods) of the experiments carried out in this chapter. Section 5.3 proposes
four different models for dealing with the other activities in the activity recognition
classification task. The measures used to quantify the classification performance of
the different approaches are defined in Section 5.4, adjusted to the focus of the ac-
tivity recognition and intensity estimation tasks. Section 5.5 presents the evaluation
techniques used in the experiments in this chapter. Results on each of the defined
classification tasks are presented and discussed in Section 5.6. A detailed analysis of
the results is supported by various confusion matrices achieved with different com-
binations of classifier, other activity model and evaluation technique. Finally, the
developed methods and obtained results are summarized in Section 5.7.
63
64 5 Robust Activity Monitoring for Everyday Life: Methods and Evaluation
Example 5.1
Here a practical use case is given where introducing and dealing with other activities
would be beneficial. The authors of [196] present an approach for energy-efficient
continuous activity recognition on mobile phones by introducing the ‘A3R’ (Adaptive
Accelerometer-based Activity Recognition) strategy. In A3R, both the accelerometer
sampling frequency and the choice of the classification features are adapted in real-
time, based on the currently performed activity. However, the A3R strategy goes into
an unknown state when not confident enough in the estimation of the activity class.
In this unknown state, the energy consumption is the highest (maximum sample rate
and using all features). Yan et al. [196] noted that in the in-site Android study users
performed many other activities beyond the 6 labeled ones, causing the appearance
of the unknown state more frequently, thus resulting in higher energy consumption.
5.1 Introduction 65
With an additional other activity class, covering a large number of usually performed
other activities and assigning an adequate sampling rate and feature set to it, the un-
known state would occur less frequently, thus reducing the overall energy consump-
tion of the mobile application.
The above mentioned approaches represent a first important step towards dealing
with various other activities. However, they only handle a given set of other activities
(the entire set of other activities is known when developing the system), thus neglect
to simulate the – in practice important – scenario when the user of the system per-
forms an activity previously unknown to the system. Therefore, it remains an open
question what happens to all the activities not considered during the monitoring sys-
tem’s development. To give a concrete example, assume that an activity monitoring
system has the goal to recognize 5 basic physical activities (walk, run, etc). When
developing this system, in addition to the activities to recognize, 10 other activities
are considered as well (vacuum clean, play soccer, etc). The system is specified so that
if a user performs any of these other activities, it is not recognized as a basic activity
but as an other activity or is rejected. Furthermore, assume that the activity ‘rope
jump’ is neither included in the basic, nor in the set of other activities. Therefore, it
is undefined how the system handles the situation when a user performs this ‘rope
jump’ activity. By not dealing with this issue, existing approaches leave basically two
possibilities: either the user is limited to scenarios where only the considered activ-
ities occur (even if 20 − 30 different activities are included in the development of a
system, this still is a significant limitation for the user), or the user is permitted to
perform any kind of physical activity, but it is not specified how the monitoring sys-
tem handles an activity not considered during the system’s development phase (e.g.
whether it is recognized as one of the basic activities). Either way, by neglecting this
issue, the applicability of an activity monitoring application is significantly limited.
although they present high performance using their approach, these results might
not have as much practical meaning as if subject independent validation would have
been applied. Further results comparing subject dependent and independent evalua-
tion techniques will be shown in this chapter.
and are also used to simulate the scenario when users perform an activity unknown
to the system. This defined classification problem will be referred to as ‘extended’
activity recognition task throughout this chapter. Moreover, for comparison reasons,
the classification problem only including the 6 basic activity classes will also be used,
and will be referred to as ‘basic’ activity recognition task.
The defined activity recognition tasks focus on the monitoring of traditionally
recommended aerobic activities (walk, run, cycle and Nordic walk), and can thus be
justified by various physical activity recommendations – as given in [66]. Especially
patients with diabetes, obesity or cardiovascular disease are often required to follow
a well defined exercise routine as part of their treatment. Therefore, the recognition
of these basic physical activities is essential to monitor the progress of the patients
and give feedback to their caregiver. Moreover, a summary of resting activities (lie,
sit and stand still) also gives feedback on how much sedentary activity the patients
“performed”. However, in other use cases the focus of an activity monitoring applica-
tion could be different, thus the definition of the classification problem (the definition
of the basic and other activity classes) would differ. Nevertheless, the methods pre-
sented in this chapter could be applied to those other classification tasks as well.
Apart from the activity recognition tasks presented above, an intensity estimation
classification task is also defined on the PAMAP2 dataset. This task will be referred
to as the ‘intensity’ classification problem, and will be used to demonstrate the neces-
sity of subject independent evaluation and to simulate the estimation of the intensity
of previously unknown activities. The ‘intensity’ task includes all 18 activities from
the PAMAP2 dataset, the goal is to distinguish activities of light, moderate and vig-
orous effort. The ground truth for this rough intensity estimation task is based on
the metabolic equivalent (MET) of the different physical activities, provided by [1].
Therefore, the 3 intensity classes are defined as follows: lie, sit, stand, watch TV, com-
puter work, drive car, iron, fold laundry and clean house are regarded as activities of
light effort (< 3.0 METs); walk, cycle, Nordic walk, descend stairs and vacuum clean
as activities of moderate effort (3.0-6.0 METs); run, ascend stairs, play soccer and
rope jump as activities of vigorous effort (> 6.0 METs). Overall, the ‘intensity’ task is
regarded as a 3-class classification problem in this chapter.
The PAMAP2 dataset provides raw sensory data from the 3 IMUs and the heart rate
monitor, which needs to be first processed in order to be used by classification algo-
rithms. A data processing chain is applied on the raw data including preprocessing,
segmentation and feature extraction steps (these data processing steps are further
described in Section 4.2). In total, 137 features are extracted from the raw signal:
133 features from IMU acceleration data (such as mean, standard deviation, energy,
entropy, correlation, etc.) and 4 features from heart rate data (mean and gradient).
These extracted features serve as input for the classification step, together with the
activity class labels provided by the dataset.
Previous work in physical activity monitoring showed that decision tree based
classifiers, especially boosted decision trees, usually achieve high performance (cf.
68 5 Robust Activity Monitoring for Everyday Life: Methods and Evaluation
e.g. [137] or the benchmark results in Section 4.4). Moreover, decision tree based clas-
sifiers have the benefit to be fast classification algorithms with a simple structure, and
are thus also easy to implement. These benefits are especially important for physical
activity monitoring applications since they are usually running on mobile, portable
systems for everyday usage, thus the available computational power is limited. There-
fore, the C4.5 decision tree classifier [126] and the AdaBoost.M1 (using C4.5 decision
tree as weak learner) algorithm [55] are used and compared in the experiments on the
defined classification problems.
Figure 5.1: The 4 proposed models for dealing with the other activities.
70 5 Robust Activity Monitoring for Everyday Life: Methods and Evaluation
create the first level of this classifier, while the second level consists of 6 sub-
classifiers: each created using the respective basic activity class and all samples
from the other activities.
Table 5.1: Confusion matrix used for the adjusted definition of the performance
measures for the activity recognition tasks.
Annotated Recognized activity
activity 1 2 ... C 0
1 P1,1 P1,2 ... P1,C P1,C+1 S1
2 P2,1 P2,2 ... P2,C P2,C+1 S2
...
C PC,1 PC,2 ... PC,C PC,C+1 SC
C +1 PC+1,1 PC+1,2 ... PC+1,C PC+1,C+1 SC+1
...
C+B PC+B,1 PC+B,2 ... PC+B,C PC+B,C+1 SC+B
R1 R2 ... RC RC+1
Table 5.2: General confusion matrix of the intensity estimation task, using an-
notated intensity classes.
Annotated Estimated intensity
intensity light moderate vigorous
light P1,1 P1,2 P1,3 S1
moderate P2,1 P2,2 P2,3 S2
vigorous P3,1 P3,2 P3,3 S3
R1 R2 R3
(‘allSeparate’ model), or classified into the other activity class (‘bgClass’ model), or
rejected before or after the classification of the basic activities (‘preReject’ or ‘postRe-
ject’ model, respectively) are counted into this null class. Using this notation, the
performance measures precision and recall are defined as following:
C
1 X Pi,i
precision = (5.1a)
C Ri
i=1
C
1 X Pi,i
recall = . (5.1b)
C Si
i=1
precision · recall
F-measure = 2 · . (5.1c)
precision + recall
Finally, since the correct classification of only the basic activities is of interest, the
measure accuracy is defined as following:
C
1 X
accuracy = P C+B Pi,i . (5.1d)
N− j=C+1 Pj,C+1 i=1
The above introduced adjusted performance measures can be applied on both ac-
tivity recognition tasks of this chapter. It should be noticed that since the ‘basic’ task
does not include any other activities, the adjusted measures reduce to the original
measures of Section 4.3.1. Concrete confusion matrices on the defined ‘basic’ and ‘ex-
tended’ classification problems are shown as results in Section 5.6.1 and Section 5.6.2,
respectively. Moreover, those confusion matrices are used to understand the results
in more detail and compare different approaches.
Table 5.3: General confusion matrix of the intensity estimation task, using an-
notated activity classes.
Annotated Estimated intensity
activity light moderate vigorous
1 P1,1 P1,2 P1,3
2 P2,1 P2,2 P2,3
...
L PL,1 PL,2 PL,3
L+1 PL+1,1 PL+1,2 PL+1,3
...
L+M PL+M,1 PL+M,2 PL+M,3
L+M +1 PL+M+1,1 PL+M+1,2 PL+M+1,3
...
L+M+V PL+M+V ,1 PL+M+V ,2 PL+M+V ,3
R1 R2 R3
Si refers to the number of samples annotated as either the annotated intensity class
light, moderate or vigorous, and Rj refers to the number of samples recognized as one
of the intensity classes. Using this notation, the 4 performance measures are defined
the following way:
The drawback of the representation of Table 5.2 is that only the confusion between
the 3 intensity classes are shown, no information about e.g. the intensity of which
specific activities is estimated inaccurately. A more detailed representation can be
given when using the annotated activity classes in the confusion matrix, as shown in
Table 5.3. For this representation, assume that the ‘intensity’ task consists of L light
effort activities: 1, . . . , L and M moderate effort activities: 1, . . . , M and V vigorous
effort activities: 1, . . . , V . Moreover, let A be the set of all different activities: A =
L ∪ M ∪ V . It is worth to note that the confusion matrix of Table 5.2 can be considered
as the result of merging all the rows in Table 5.3 of activities belonging to the same
intensity level.
5.5 Evaluation Techniques 73
Using the definitions of (5.3) and (5.4), the performance measures precision and
recall can be defined as follows:
1 i∈L Pi,1
P P P
i∈M Pi,2 Pi,3
precision = ( + + i∈V ) (5.5a)
3 R1 R2 R3
1
P P P
Pi,1 Pi,2 Pi,3
recall = ( i∈L + i∈M + i∈V ). (5.5b)
3 S1 S2 S3
The definition of F-measure remains unaltered:
precision · recall
F-measure = 2 · , (5.5c)
precision + recall
Results on the ‘intensity’ classification task are shown in Section 5.6.3, using the
representation form of Table 5.3 and the performance measures as defined by (5.5).
The simulation of everyday life scenarios means concretely to simulate how the
created system behaves when used by a previously (in training time) unknown per-
son, or when a previously unknown activity is performed. To simulate subject in-
dependency the evaluation technique leave-one-subject-out (LOSO) CV is applied.
Since the used PAMAP2 dataset provides data from 9 subjects, LOSO 9-fold CV is
applied in the experiments of this chapter. Moreover, to simulate the scenario of per-
forming unknown other activities a new evaluation technique is introduced: leave-
one-activity-out (LOAO). The basic idea of LOAO is similar to the LOSO evaluation
technique. However, the concrete definition of LOAO for the activity recognition and
intensity estimation tasks of this chapter is described in more detail in the next two
subsections.
• The system is trained with a large amount of subjects for the ‘extended’ task.
Then the system is deployed to a new subject (thus for this subject no data was
available during the training phase of the system), and the new subject performs
one of the basic activities (estimated through the LOSO component).
• The system is trained with a large amount of subjects for the ‘extended’ task.
Then the system is deployed to a new subject, who performs one of the known
other activities (estimated through the LOSO component). This is the first step
in testing the robustness of the system in situations when the user performs
activities other than the few basic recognized ones.
• The system is trained with a large amount of subjects for the ‘extended’ task.
Then the system is deployed to a new subject, who performs a previously un-
known activity – thus an activity neither belonging to the basic activity classes,
nor to one of the other activities available during the training phase (estimated
through the LOOAO component). This scenario simulates basically the gener-
alization characteristic of the classifier’s other activity model, estimating how
5.5 Evaluation Techniques 75
robust the system is in the usually neglected situation when unknown activities
are performed.
Table 5.4: Confusion matrix on the ‘basic’ task using the C4.5 decision tree clas-
sifier and standard CV evaluation technique. The table shows how different an-
notated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6
1 lie 100 0 0 0 0 0
2 sit/stand 0 99.87 0 0 0.13 0
3 walk 0 0.05 99.66 0 0.02 0.27
4 run 0 0 0 100 0 0
5 cycle 0 0.29 0.25 0.11 99.29 0.06
6 Nordic walk 0 0 0.60 0 0 99.40
both standard CV and LOSO evaluation. Each of the tests is performed 10 times, the
table shows the mean and standard deviation of these 10 test runs.
The results of Table 5.5 show the significant difference between standard CV and
LOSO, for both classifiers. Table 5.6 shows the confusion matrix on the ‘basic’ task
using the C4.5 decision tree and LOSO as evaluation technique. Comparing the confu-
sion matrices of Table 5.4 (F-measure is 99.71%) and Table 5.6 (F-measure is 95.50%)
the differences between the results obtained with standard CV and LOSO can be ob-
served in more detail. The recognition rate of all 6 activities decreases with LOSO
evaluation, this is the most significant with the activity Nordic walk: the performance
decreases from 99.40% to 83.19%. The reason for the lower performance can be ex-
plained by the diversity in how subjects perform physical activities (e.g. the differing
pattern and intensity of arm movements during the activities walk and Nordic walk
by different subjects, which leads to the significant confusion between these two ac-
tivities in Table 5.6). Subject independent evaluation simulates this behaviour, while
subject dependent evaluation ignores it. Therefore, the latter method leads to highly
“optimistic” results as observed in Table 5.5, and will be shown in Table 5.7 and Ta-
ble 5.12 on the classification tasks ‘extended’ and ‘intensity’, respectively.
An interesting result in Table 5.5 is that the AdaBoost.M1 classifier only slightly
outperforms the C4.5 classifier on the ‘basic’ task (the difference between the two
classifiers on the ‘extended’ task is much more significant, as shown in the next sub-
section). This can be explained by the fact that the ‘basic’ task is a rather simple
classification problem where even base-level classifiers can reach the highest possi-
ble accuracy. Therefore, it is not necessarily worth using more complex classification
algorithms here. The lower performance when using LOSO evaluation is due to the
difficulty of the generalization in respect of the users, and not due to the difficulty of
the classification task.
Although using subject independent evaluation is the first step towards simulat-
ing the conditions of everyday usage of activity monitoring applications, the ‘basic’
task only estimates the system’s behaviour when activities of one of the 6 included ac-
tivity classes are performed, thus the system’s response is not defined when the user
performs activities such as descend stairs or vacuum clean. This issue is discussed in
the next subsection, by analyzing the results obtained on the ‘extended’ task.
Robust Activity Monitoring for Everyday Life: Methods and Evaluation
Table 5.5: Performance measures on the ‘basic’ activity recognition task. The results are averaged over 10 test runs, mean
and standard deviation is given for each experimental setup.
Classifier Evaluation method Precision Recall F-measure Accuracy
C4.5 standard CV 99.71 ± 0.04 99.70 ± 0.02 99.71 ± 0.03 99.71 ± 0.03
LOSO 96.05 ± 1.06 94.96 ± 1.40 95.50 ± 1.20 95.14 ± 1.10
AdaBoost.M1 standard CV 99.97 ± 0.02 99.97 ± 0.02 99.97 ± 0.02 99.97 ± 0.02
LOSO 95.91 ± 1.45 95.47 ± 1.45 95.69 ± 1.40 95.43 ± 1.54
5 78
5.6 Results and Discussion 79
Table 5.6: Confusion matrix on the ‘basic’ task using the C4.5 decision tree clas-
sifier and LOSO CV evaluation technique. The table shows how different anno-
tated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6
1 lie 97.56 2.33 0 0 0.12 0
2 sit/stand 0.04 98.24 0.80 0 0.93 0
3 walk 0 2.58 94.97 0 0.75 1.71
4 run 0 0 0 98.24 1.69 0.07
5 cycle 0 1.94 0.29 0.14 97.56 0.07
6 Nordic walk 0 0 16.79 0.02 0 83.19
The performance measures on the ‘extended’ task are presented in Table 5.7: for each
of the 4 other activity models, by using the 2 classifiers and the 3 different evaluation
techniques. The results are given in form of mean and standard deviation of the 10
test runs performed for every possible combination of the models, classifiers and eval-
uation methods. Overall it is clear that with the inclusion of the other activities the
classification task becomes significantly more difficult (cf. the comparison of the re-
sults achieved with standard CV and LOSO to the respective results on the ‘basic’ task
in Table 5.5). This can be explained not only by the increased number of activities in
the classification problem (it should be noted that the defined performance measures
for the ‘extended’ task only focus on the basic activity classes, thus the results are
comparable with that of the ‘basic’ task), but also by the fact that the characteristic of
some of the introduced other activities overlap with the characteristic of some of the
basic activity classes. For example, the other activity iron has a similar characteris-
tic to talking and gesticulating during stand, thus misclassifications appear between
these two activities. Similarly it is nontrivial to distinguish running with a ball (dur-
ing the other activity play soccer) from just running. Since the ‘extended’ task defines a
complex classification problem, it is worth to apply more complex classification algo-
rithms here – contrary to the ‘basic’ classification task. For example when considering
the ‘allSeparate’ model and LOSO evaluation, the C4.5 decision tree only achieves an
F-measure of 83.30% while with the AdaBoost.M1 classifier 92.22% can be reached.
From the results of Table 5.7 it is obvious that the performance measures achieved
with LOSO evaluation are significantly lower than results obtained with standard CV,
as already seen in Table 5.5 and explained in Section 5.6.1. If only considering subject
independency the ‘allSeparate’ model performs best, closely followed by the models
‘preReject’ and ‘bgClass’. However, on the ‘extended’ task it is also simulated when
the user of the system performs unknown other activities (LOOAO). The results of
applying the evaluation method of Algorithm 5.1 are shown in Table 5.7 in the re-
spective rows of LOSO_LOOAO. Considering this combined evaluation technique the
‘bgClass’ model performs best, followed by the models ‘preReject’ and ‘allSeparate’.
From all the 4 other activity models the ‘allSeparate’ model shows the largest decrease
Robust Activity Monitoring for Everyday Life: Methods and Evaluation
Table 5.7: Performance measures on the ‘extended’ activity recognition task. The results are averaged over 10 test runs,
mean and standard deviation is given for each experimental setup.
Model Classifier Evaluation method Precision Recall F-measure Accuracy
’allSeparate’ C4.5 standard CV 98.17 ± 0.23 98.00 ± 0.09 98.09 ± 0.14 95.80 ± 0.25
LOSO 89.77 ± 1.89 77.75 ± 3.08 83.30 ± 2.10 73.81 ± 2.21
LOSO_LOOAO 81.84 ± 1.77 78.59 ± 3.43 80.16 ± 2.44 67.06 ± 2.71
AdaBoost.M1 standard CV 99.94 ± 0.01 99.93 ± 0.04 99.93 ± 0.02 99.83 ± 0.05
LOSO 95.42 ± 0.98 89.23 ± 2.00 92.22 ± 1.40 86.60 ± 2.09
LOSO_LOOAO 86.80 ± 0.99 88.72 ± 1.28 87.75 ± 1.07 78.83 ± 1.29
’bgClass’ C4.5 standard CV 98.68 ± 0.17 98.66 ± 0.11 98.67 ± 0.12 96.85 ± 0.21
LOSO 89.85 ± 1.35 85.83 ± 3.11 87.78 ± 2.11 80.63 ± 1.81
LOSO_LOOAO 83.64 ± 2.46 85.56 ± 2.67 84.58 ± 2.39 73.76 ± 2.10
AdaBoost.M1 standard CV 99.96 ± 0.02 99.88 ± 0.03 99.92 ± 0.02 99.77 ± 0.05
LOSO 96.07 ± 0.99 85.76 ± 2.45 90.61 ± 1.72 84.14 ± 2.35
LOSO_LOOAO 91.81 ± 0.82 86.82 ± 1.71 89.24 ± 1.17 80.97 ± 1.20
’preReject’ C4.5 standard CV 98.28 ± 0.14 97.83 ± 0.12 98.05 ± 0.07 95.46 ± 0.14
LOSO 88.58 ± 1.40 78.66 ± 2.51 83.30 ± 1.36 71.78 ± 1.76
LOSO_LOOAO 83.07 ± 1.68 78.83 ± 3.63 80.87 ± 2.53 67.32 ± 2.74
AdaBoost.M1 standard CV 99.95 ± 0.04 99.89 ± 0.04 99.92 ± 0.04 99.82 ± 0.06
LOSO 93.85 ± 1.57 88.46 ± 2.26 91.07 ± 1.83 85.20 ± 2.07
LOSO_LOOAO 87.99 ± 1.47 87.98 ± 1.80 87.98 ± 1.58 79.11 ± 1.60
’postReject’ C4.5 standard CV 99.08 ± 0.09 98.21 ± 0.15 98.64 ± 0.10 96.89 ± 0.20
5
Table 5.8: Confusion matrix on the ‘extended’ task using the ‘allSeparate’ model,
AdaBoost.M1 classifier and LOSO evaluation technique. The table shows how
different annotated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6 0
1 lie 97.35 0.71 0 0 0 0 1.94
2 sit/stand 0.03 91.93 0 0 0 0 8.04
3 walk 0 0 89.08 0 0 0.21 10.71
4 run 0 0 0 73.76 0 0.01 26.23
5 cycle 0 0.01 0.03 0 96.05 0.06 3.85
6 Nordic walk 0 0 6.17 0 0.02 87.21 6.59
7 drive car 0 47.64 0 0 0.83 0 51.54
8 asc. stairs 0 0 0.62 0 0 0 99.38
9 desc. stairs 0 0 0 0 0.79 0 99.21
10 vacuum clean 0 0.01 0 0 0.16 0 99.83
11 iron 0 2.83 0 0 0.03 0 97.14
12 fold laundry 0 5.33 0 0 0.02 0 94.65
13 clean house 0.02 5.53 0.01 0 0.31 0 94.13
14 play soccer 0 0 2.48 10.66 0 0.28 86.58
15 rope jump 0 0 0 0.76 0 0 99.24
Table 5.9: Confusion matrix on the ‘extended’ task using the ‘allSeparate’ model,
AdaBoost.M1 classifier and LOSO_LOOAO evaluation technique. The table
shows how different annotated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6 0
1 lie 97.37 1.02 0 0 0 0 1.60
2 sit/stand 0.12 90.95 0 0 0 0 8.93
3 walk 0 0 91.22 0 0 0.22 8.55
4 run 0 0 0 75.89 0 0 24.11
5 cycle 0 0 0.06 0 95.11 0.19 4.64
6 Nordic walk 0 0 12.54 0 0 81.78 5.68
7 drive car 0.02 49.67 0 0 0.12 0 50.19
8 asc. stairs 0 0 12.49 0 0.91 1.48 85.12
9 desc. stairs 0 0 9.29 0.02 17.92 1.97 70.81
10 vacuum clean 0 0.02 0 0 9.17 0 90.81
11 iron 0 22.02 0 0 0.00 0 77.97
12 fold laundry 0 6.53 0 0 0.03 0 93.44
13 clean house 0.09 9.71 0.02 0 0.35 0 89.84
14 play soccer 0 0 4.66 31.67 0 2.43 61.24
15 rope jump 0 0 0.35 6.33 1.41 0 91.90
5.6 Results and Discussion 83
Table 5.10: Confusion matrix on the ‘extended’ task using the ‘postReject’
model, AdaBoost.M1 classifier and LOSO_LOOAO evaluation technique. The
table shows how different annotated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6 0
1 lie 88.58 0.45 0 0 0 0 10.98
2 sit/stand 0.31 89.79 0 0 0 0 9.90
3 walk 0 0 81.29 0 0 0.13 18.58
4 run 0 0 0 56.61 0 0.30 43.09
5 cycle 0 0 0 0 91.82 0.03 8.15
6 Nordic walk 0 0 6.00 0 0 75.81 18.19
7 drive car 0 36.70 0 0 0 0 63.30
8 asc. stairs 0 0 0.78 0 0 0 99.22
9 desc. stairs 0 0 0.22 0 1.22 0.33 98.23
10 vacuum clean 0 0 0 0 0.69 0 99.31
11 iron 0 16.54 0 0 0.05 0 83.41
12 fold laundry 0 2.35 0 0 0 0 97.65
13 clean house 0.41 6.37 0 0 0.09 0 93.13
14 play soccer 0 0 3.27 25.22 0 1.77 69.75
15 rope jump 0 0 0.03 2.91 0 0 97.07
‘bgClass’ model can be regarded as the model with the best generalization characteris-
tic: the approach using the ‘bgClass’ model and the AdaBoost.M1 classifier achieves
an average F-measure of 89.24% and an average accuracy of 80.97%. The confusion
matrix obtained with this approach is shown in Table 5.11 (the results represent the
average from the 10 test runs). It is obvious that most of the misclassifications occur
due to the other activities: either a sample belonging to a basic activity class is classi-
fied into the background class, or a sample from an other activity is confused with one
of the basic activities. For example, drive car and iron are in high percentage confused
with the basic class sit/stand. This is due to the overlapping characteristic of some
basic and other activities, as already discussed above. The strength of the ‘bgClass’
model is especially pointed out by the results obtained with other activities such as as-
cend stairs, descend stairs, vacuum clean or rope jump: although previously unknown to
the system, these activities were basically not misclassified as a basic activity. There-
fore, it can be expected that the proposed approach shows such robustness with most
of other unknown activities as well. Only unknown activities similar to the target
activities might be problematic for the ‘bgClass’ approach, as seen with drive car or
iron, or is expected with activities such as computer work or watch TV. However, it is
difficult to set the defining boundaries of some of the basic activity classes – e.g. if
computer work should be regarded as sitting or as a separate other class. Deciding this
question might highly depend on the actual application.
84 5 Robust Activity Monitoring for Everyday Life: Methods and Evaluation
Table 5.11: Confusion matrix on the ‘extended’ task using the ‘bgClass’ model,
AdaBoost.M1 classifier and LOSO_LOOAO evaluation technique. The table
shows how different annotated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6 0
1 lie 96.66 2.62 0 0 0 0 0.72
2 sit/stand 0.15 90.05 0 0 0 0 9.80
3 walk 0 0 85.87 0 0 0.13 14.00
4 run 0 0 0.16 76.24 0 0.30 23.31
5 cycle 0 0 0.01 0 92.43 0.03 7.53
6 Nordic walk 0 0 8.71 0 0 79.69 11.60
7 drive car 0 39.10 0 0 0.06 0 60.84
8 asc. stairs 0 0 0.53 0 0 0.01 99.46
9 desc. stairs 0.08 0 1.97 0.02 0.84 0.06 97.04
10 vacuum clean 0 0 0 0 0.42 0 99.58
11 iron 0 20.02 0 0 0.01 0 79.97
12 fold laundry 0 3.70 0.01 0 0 0 96.29
13 clean house 0.26 7.10 0 0 0.06 0 92.58
14 play soccer 0 0 3.34 32.08 0 0.13 64.46
15 rope jump 0 0 0.11 0.11 0 0 99.78
85
86 5 Robust Activity Monitoring for Everyday Life: Methods and Evaluation
Table 5.13: Confusion matrix on the ‘intensity’ classification task using the Ada-
Boost.M1 classifier and LOSO evaluation technique. The table shows how differ-
ent annotated activities are classified in [%].
Annotated Estimated intensity
activity light moderate vigorous
1 lie 99.97 0.03 0
2 sit 99.98 0.02 0
3 stand 99.98 0.02 0
4 walk 0.02 96.38 3.60
5 run 0 0.07 99.93
6 cycle 0.44 99.45 0.11
7 Nordic walk 0 96.42 3.58
8 watch TV 100.00 0 0
9 computer work 100.00 0 0
10 drive car 97.26 2.74 0
11 asc. stairs 0.62 7.89 91.49
12 desc. stairs 1.01 93.23 5.75
13 vacuum clean 19.61 80.06 0.33
14 iron 98.85 1.15 0
15 fold laundry 90.36 9.64 0
16 clean house 72.97 26.74 0.29
17 play soccer 0 8.15 91.85
18 rope jump 0 0.03 99.97
Overall, the results on the ‘intensity’ task presented in this subsection (especially
the results in Table 5.14) show that the intensity estimation when subjects perform
previously unknown activities can be highly unreliable. On the one hand, this empha-
sizes on the importance of applying LOSO_LOAO evaluation, thus that the simulation
of a trained classifier’s performance on unknown activities should not be neglected.
On the other hand, these results also encourage to develop more robust approaches
for the intensity estimation of physical activities, such that they have better general-
ization characteristics.
5.7 Conclusion
This chapter developed the means for simulating everyday life scenarios and thus to
evaluate the robustness of activity recognition and intensity estimation – a usually
neglected point of view in the development of physical activity monitoring systems.
Experiments were carried out on classification problems defined on the recently re-
leased PAMAP2 physical activity monitoring dataset. An activity recognition task
was defined, including 6 basic activity classes and 9 different other activities. The
goal of this classification task was the accurate recognition and separation of the ba-
5.7 Conclusion 87
Table 5.14: Confusion matrix on the ‘intensity’ classification task using the Ada-
Boost.M1 classifier and LOSO_LOAO evaluation technique. The table shows how
different annotated activities are classified in [%].
Annotated Estimated intensity
activity light moderate vigorous
1 lie 99.97 0.03 0
2 sit 99.99 0.01 0
3 stand 99.86 0.14 0
4 walk 0.26 44.81 54.93
5 run 0 3.28 96.72
6 cycle 52.52 47.07 0.41
7 Nordic walk 0 74.59 25.41
8 watch TV 100.00 0 0
9 computer work 100.00 0 0
10 drive car 97.72 2.28 0
11 asc. stairs 0.73 97.07 2.20
12 desc. stairs 2.98 25.18 71.84
13 vacuum clean 87.27 12.19 0.54
14 iron 96.14 3.86 0
15 fold laundry 90.91 9.09 0
16 clean house 45.96 53.50 0.55
17 play soccer 0.18 29.49 70.33
18 rope jump 0 3.21 96.79
sic activities, while samples of the other activities should be recognized as part of
an other activity class or should be rejected. Moreover, an intensity estimation task
was defined including all 18 activities from the PAMAP2 dataset. The goal of this
classification task was to distinguish activities of light, moderate and vigorous effort.
Common data processing and classification methods were used to achieve the classi-
fication goals, comparing two – in previous work successfully applied – classification
algorithms: the C4.5 decision tree classifier and the AdaBoost.M1 algorithm. More-
over, to deal with other activities in the activity recognition task, 4 different models
are proposed: ‘allSeparate’, ‘bgClass’, ‘preReject’ and ‘postReject’. Finally, the evalu-
ation of the proposed methods was performed with different techniques, including
standard CV, LOSO and the newly introduced LOAO. Standard 10-fold CV was only
included for comparison reasons: to underline how unrealistic the so achieved perfor-
mance is in everyday life scenarios. The LOSO technique serves to simulate subject in-
dependency, while LOAO simulates the scenario of performing unknown other activ-
ities. Considering the activity recognition task, the results of the thorough evaluation
process revealed that the ‘bgClass’ model has the best generalization characteristic,
while the generalization capability of the widely used ‘allSeparate’ approach is rather
limited in respect of recognizing previously unknown activities. As for the intensity
88 5 Robust Activity Monitoring for Everyday Life: Methods and Evaluation
estimation task, the results showed that classification can be highly unreliable when
dealing with previously unknown activities, thus encouraging to improve existing
approaches.
Developing physical activity monitoring systems while also taking e.g. subject in-
dependency or unknown activities into account has two important benefits compared
to when standard CV evaluation is used only. First of all it estimates how the devel-
oped system behaves in various everyday life scenarios, while this behaviour would be
otherwise undefined. Moreover, the best performing models and algorithms can be se-
lected when applying LOSO and LOAO evaluation during the development phase of
the system, hence creating the best possible system from the robustness point of view
for everyday life. In future work it is planned to investigate how well the developed
approaches generalize with user groups (e.g. elderly) significantly differing from the
subjects (all young, healthy adults) included in the PAMAP2 dataset. Moreover, it is
also planned to investigate the effect of increasing the number of known (thus in the
training included) other activities, with the goal to increase the robustness towards
unknown other activities even more while keeping the high performance regarding
the basic activity classes.
Confidence-based Multiclass AdaBoost
6
6.1 Introduction
The use of meta-level classifiers for physical activity monitoring problems is not as
widespread as using different base-level classifiers. However, comparing base-level
and meta-level classifiers on different activity recognition tasks shows that meta-
level classifiers (such as boosting, bagging, plurality voting, etc.) outperform base-
level classifiers [131]. A complex activity recognition problem including 13 differ-
ent physical activities is used to evaluate the most widely used base-level (decision
trees, k-Nearest Neighbors (kNN), Support Vector Machines (SVM) and Naive Bayes
classifiers) and meta-level (bagging, boosting) classifiers [137]. Best performance
was achieved with a boosted decision tree classifier. The benchmark results on the
PAMAP2 dataset in Section 4.4 confirm that using a boosted C4.5 decision tree classi-
fier is one of the most promising methods for physical activity monitoring.
The boosted decision tree classifier has – apart from good performance results as
mentioned above – further benefits: it is a fast classification algorithm with a simple
structure, and is therefore easy to implement. These benefits are especially important
for physical activity recognition applications since they are usually running on mo-
bile, portable systems for everyday usage, thus the available computational power is
limited.1 Section 8.3.2 will show the feasibility of using boosted decision tree classi-
fier for physical activity monitoring on a mobile platform. Moreover, boosting deci-
sion trees has been widely and successfully used in other research fields, e.g. recently
in multi-task learning [45]. Therefore, considering all the above mentioned benefits,
this chapter focuses on using boosting, and in particular using boosted decision tree
classifiers for physical activity monitoring.
The benchmark results on the PAMAP2 dataset reveal that the difficulty of the
more complex tasks exceeds the potential of existing classifiers. Moreover, the re-
1 This is the reason why e.g. kNN (which also showed generally good performance results on activity
recognition tasks) is not further considered here: it is a computationally intensive algorithm, even the
more advanced versions of it where the number of distance comparisons is reduced.
89
90 6 Confidence-based Multiclass AdaBoost
sults in Chapter 5 show rather low performance when fully simulating how the most
common classifiers perform in everyday life scenarios: None of the results reached
an F-measure of 90% on the ‘extended’ task when using LOSO_LOOAO evaluation
technique. Therefore, there is a reasonable demand for modifying and improving
existing algorithms. This chapter proposes a confidence-based extension of the well-
known AdaBoost.M1 algorithm, called ConfAdaBoost.M1. It builds on established
ideas of existing boosting methods. The main contribution of this chapter is thus the
ConfAdaBoost.M1 algorithm itself and to show that ConfAdaBoost.M1 significantly
improves the results of previous boosting algorithms.
This chapter is organized in the following way: Section 6.2 gives an overview of
existing boosting algorithms, highlighting their benefits and drawbacks. The new
ConfAdaBoost.M1 algorithm is introduced in Section 6.3. In Section 6.4 the new al-
gorithm is evaluated on various benchmark datasets from the UCI machine learning
repository, comparing it to the most commonly used existing boosting methods. Sec-
tion 6.5 presents the evaluation on a complex activity recognition and intensity esti-
mation problem defined on the PAMAP2 dataset. The main motivation for presenting
the ConfAdaBoost.M1 algorithm is the better performance it achieves, compared to
existing algorithms, on activity monitoring classification tasks. Finally, the chapter is
summarized in Section 6.6.
7: for i ← 1, N do
8: if yi , f t (x i ) then
9: wi ← wi e α t
10: end if
11: end for
Normalize the weight of all instances so that i wi = 1
P
12:
13: end for
14: end procedure
of the most important ensemble methods, and is named one of the top 10 data mining
algorithms by Wu et al. [194].
Already the first version of AdaBoost defines the main ideas of the boosting tech-
nique [55]. Assume that a training dataset of N instances is given: (x i , yi ) i = 1, . . . , N
(x i is the feature vector, yi ∈ {−1, +1}). The algorithm trains the weak learners f t (x)
on weighted versions of the training dataset, giving higher weight to instances that
are currently misclassified. This is done for a predefined T number of iterations.
The final classifier is a linear combination of the weak learners from each iteration,
weighted according to their error rate on the training dataset. This first version of the
AdaBoost algorithm was only designed for binary classification problems. As a weak
learner, any kind of classifier can be used as long as it is better than random guessing.
However, this version of AdaBoost only uses the binary output of the weak learners
(−1 or +1), thus was called Discrete AdaBoost in [58]. The algorithm is shown in
Algorithm 6.1.
A generalization of Discrete AdaBoost is to use real-valued predictions of the weak
learners rather than the {−1, +1} output. Friedman et al. [58] introduced the algorithm
Real AdaBoost, shown in Algorithm 6.2. In this version of AdaBoost the weak learn-
ers return a class probability estimate pt (x) in each boosting iteration, from which the
classification rule f t (x) is derived. The sign of f t (x) gives the classification prediction,
and |f t (x)| gives a measure of how confident the weak learner is in the prediction. Ex-
periments by Friedman et al. [58] on various datasets from the UCI machine learning
92 6 Confidence-based Multiclass AdaBoost
repository, [12], show that this confidence-based version of AdaBoost outperforms the
original Discrete AdaBoost algorithm. However, Real AdaBoost is limited to binary
classification problems as well.
Apart from Discrete and Real AdaBoost, further boosting methods have been de-
veloped for the binary classification case the past decade. Friedman et al. [58] show
that the Discrete and Real AdaBoost algorithms can be interpreted as stage-wise esti-
mation procedures for fitting an additive logistic regression model, optimizing an ex-
ponential criterion which to second order is equivalent to the binomial log-likelihood
criterion. Based on this interpretation of AdaBoost, they introduce the LogitBoost al-
gorithm, which optimizes a more standard (the Bernoulli) log-likelihood. Moreover,
Friedman et al. [58] also present the Gentle AdaBoost algorithm, a modified version
of Real AdaBoost. It uses Newton stepping rather than exact optimization at each
boosting iteration. Another variant of Real AdaBoost – that uses a weighted emphasis
function – is presented in [60], called Emphasis Boost. Finally, the Modest AdaBoost
algorithm is mentioned here [184]. It not only considers the updated weight distri-
bution for training a classification rule in each boosting step, but also considers the
inverse weight distribution to decrease a weak learner’s contribution if it works “too
good” on data that has already been correctly classified with high margin. As a result,
although the training error decreases slower than for comparable methods, Modest
AdaBoost produces less generalization error.
6.2 Boosting Methods: Related Work 93
The first extensions of AdaBoost for multiclass classification problems can be re-
garded as pseudo-multiclass solutions: they reduce the multiclass problem into mul-
tiple two class problems [153, 155]. One of the most common solutions using binary
boosting methods for multiclass problems is AdaBoost.MH, introduced by Schapire
and Singer [155]. It converts a C class problem into that of estimating a two class clas-
sifier on a training set C times as large, by adding a new “feature” which is defined
by the class labels. Thus the original number of N instances is expanded into N C
instances. On this new, augmented dataset a binary AdaBoost method (e.g. Discrete
or Real AdaBoost) can then be applied. Algorithm 6.3 shows the Real AdaBoost.MH
algorithm: the extension of the previously presented Real AdaBoost algorithm for the
multiclass case using the AdaBoost.MH technique.
There exist other solutions to reduce the multiclass problem into multiple bi-
nary classification problems. Schapire [153] combined error-correcting output codes
(ECOC) with the original binary AdaBoost method to solve multiclass problems, re-
sulting in the AdaBoost.MO algorithm. Friedman et al. [58] showed how the binary
LogitBoost algorithm can be applied for the multiclass case by introducing a “class
feature” similar to the AdaBoost.MH method. In [153] experimental results are given
94 6 Confidence-based Multiclass AdaBoost
The first direct multiclass extension of the original AdaBoost algorithm, AdaBoost.M1,
was introduced in [55] and is the most widely used multiclass boosting method. It is
also the basis of many further variants of multiclass boosting. The AdaBoost.M1 al-
gorithm is shown in Algorithm 6.4. Similar to the binary AdaBoost methods, it can
be used with any weak classifier that has an error rate of less than 0.5. However, this
criterion is more restrictive than for binary classification, where an error rate of 0.5
means basically random guessing. In [55] a second multiclass extension of the orig-
inal AdaBoost algorithm, AdaBoost.M2, was also introduced. In this algorithm the
weak classifiers have to minimize a newly introduced pseudo-loss, instead of mini-
malizing the error rate as done usually. The pseudo-loss of the weak classifiers has
to be less than 0.5, but this is a much weaker condition than the error rate being less
than 0.5. The drawback of AdaBoost.M2 is that classifiers have to be redesigned in
order to be used as weak learners within this algorithm, since almost all traditionally
used classifiers minimize the error rate and not the new pseudo-loss.
In [41, 203] another way to overcome the restriction on the weak learner’s error
rate is shown by adding a constant taking the number of classes (C) into account, this
way relaxing the requirement of the weak classifiers to an error rate of less than ran-
dom guessing (1 − C1 ). Eibl and Pfeiffer [41] introduced the AdaBoost.M1W algorithm
based on this idea, and proved with experiments its benefits over AdaBoost.M1. The
SAMME (Stagewise Additive Modeling using a Multi-class Exponential loss function)
algorithm of Zhu et al. [203] is based on the same idea, as shown in Algorithm 6.5.
SAMME has the same structure as AdaBoost.M1, the only difference is on line 9 where
the term log(C − 1) is added. Zhu et al. [203] show that this extra term is not ar-
6.2 Boosting Methods: Related Work 95
10: for i ← 1, N do
11: if yi , f t (x i ) then
12: wi ← wi e α t
13: end if
14: end for
Normalize the weight of all instances so that i wi = 1
P
15:
16: end for
17: end procedure
10: for i ← 1, N do
11: if yi , f t (x i ) then
12: wi ← wi e α t
13: end if
14: end for
Normalize the weight of all instances so that i wi = 1
P
15:
16: end for
17: end procedure
SAMME.R algorithm showed overall slightly worse performance results than SAMME
on different datasets [202], thus is discarded from further analysis in this work.
Another multiclass boosting method is introduced in [72]: GAMBLE (Gentle Adap-
tive Multiclass Boosting Learning) is the generalized version of the binary Gentle Ada-
Boost algorithm. However, GAMBLE fits a regression model rather than a classifica-
tion model at each boosting iteration, thus requires several additional steps in order
to be used for classification tasks (which is the actual focus of this chapter). First
the class labels have to be encoded (e.g. with response encoding), then the regression
model is fitted which is then used to obtain the weak classifier. Overall, the training
time and computational cost is significantly increased compared to AdaBoost models
using directly classification models.
6.3 ConfAdaBoost.M1 97
11: for i ← 1, N do
1
−I(y =f (x )) p α
12: wi ← wi e 2 i t i ti t
% I() refers to the indicator function
13: end for
Normalize the weight of all instances so that i wi = 1
P
14:
15: end for
16: end procedure
6.3 ConfAdaBoost.M1
Various boosting algorithms exist and were presented in the previous section. How-
ever, there are still classification problems where the difficulty of the task exceeds
the potential of existing methods. Examples of such complex tasks in the field of
physical activity monitoring were shown in the benchmark of [135, 136]. Moreover,
experiments presented in this chapter show a high error rate on the PAMAP2 phys-
ical activity monitoring dataset with selected, commonly used boosting algorithms.
Therefore, there is a need for further development of boosting techniques to improve
the performance on such complex classification tasks.
98 6 Confidence-based Multiclass AdaBoost
It should be noted that the stopping criterion of et ≥ 0.5 in the original Ada-
Boost.M1 remains the same in the new ConfAdaBoost.M1 algorithm (line 7 of Al-
gorithm 6.6). This means that, similar to AdaBoost.M1, only classifiers achieving
a reasonably high accuracy value can be used as weak learners, thus e.g. decision
stumps are not suitable for multiclass problems. However, the stopping criterion
of et ≥ 0.5 is less restrictive in ConfAdaBoost.M1, since the computation of the er-
ror rate also uses the pti confidence values, thus the computed et is lower than in
the original AdaBoost.M1 algorithm. Therefore, when using the same weak learner,
ConfAdaBoost.M1 can perform significantly more boosting iterations before stopping
compared to AdaBoost.M1, as shown in the experiments of the next sections.
and let Sc be the subset of S belonging to class c. The confidence of the prediction is
then: ,X
X
pti = wj wj . (6.1)
j∈Sc j∈S
On the 5 larger, pre-partitioned datasets pruned C4.5 decision trees are used. The
level of pruning is defined by 5-fold cross-validation (CV) on the training part of these
datasets, for each of the evaluated boosting methods separately. On the 3 smaller
datasets (Glass, Iris and Vehicle), non-pruned C4.5 decision trees are used as weak
learners. Between 1 and 500 boosting iterations are evaluated for all algorithms and
benchmark datasets (previous work e.g. in [41, 203] showed that the performance of
various boosting algorithms usually levels off at maximum 100 iterations). All results
presented below are averages of multiple test runs. On datasets providing a training
and test part training is performed 10 times on the training set, and the trained classi-
fier is then evaluated on the provided test set each time. On datasets without a prede-
fined test part, 10-fold CV is used and performed 10 separate times. All experiments
were performed within Matlab, random substreams are used to ensure randomness
between different test runs.
Glass Iris
0.5 0.1
AdaBoost.M1 AdaBoost.M1
QuinlanAdaBoost.M1 QuinlanAdaBoost.M1
0.45 ConfAdaBoost.M1 0.09 ConfAdaBoost.M1
SAMME SAMME
AdaBoost.MH AdaBoost.MH
0.4 0.08
Test error
Test error
0.35 0.07
0.3 0.06
0.25 0.05
0.2 0.04
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Iteration Iteration
Vehicle Letter
0.36
0.25
AdaBoost.M1 AdaBoost.M1
QuinlanAdaBoost.M1 QuinlanAdaBoost.M1
0.34
ConfAdaBoost.M1 ConfAdaBoost.M1
SAMME SAMME
0.2
0.32 AdaBoost.MH AdaBoost.MH
0.3
Test error
Test error
0.15
0.28
0.26
0.1
0.24
0.22 0.05
0.2
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Iteration Iteration
Figure 6.1: Test error of the 5 evaluated boosting algorithms on the UCI bench-
mark datasets Glass, Iris, Vehicle and Letter. The results are averages over 10 test
runs.
rather small datasets is difficult since it can occur due to sampling fluctuations, while
on the larger datasets clearer trends are observable.
One of the main reasons why AdaBoost.M1 and QuinlanAdaBoost.M1 performs
significantly worse than the other methods is that they reach the stopping criterion of
et ≥ 0.5 quickly. This can be observed especially on the results of the datasets Glass,
Vehicle or Satimage: the test error decreases at the beginning but levels off already
at around 10 to 20 boosting iterations, no further improvement can be reached with
the increase of the number of boosting rounds. This effect is not observed when us-
ing the ConfAdaBoost.M1 algorithm due to the modified computation of the error
rate of the weak learners. Another benefit of ConfAdaBoost.M1 over the other meth-
ods can be observed e.g. on the results of the datasets Vehicle, Letter and Satimage:
the test error even at lower numbers of boosting iterations is the lowest when using
ConfAdaBoost.M1. This means that for a particular level of accuracy fewer boosting
rounds are necessary with ConfAdaBoost.M1, thus a smaller classifier size is required
for the same performance compared to existing boosting algorithms. This quality is
6.4 Evaluation on UCI Datasets 103
Pendigits Satimage
0.14 0.18
AdaBoost.M1 AdaBoost.M1
QuinlanAdaBoost.M1 0.17 QuinlanAdaBoost.M1
0.12 ConfAdaBoost.M1 ConfAdaBoost.M1
SAMME 0.16 SAMME
AdaBoost.MH AdaBoost.MH
0.15
0.1
Test error
Test error
0.14
0.08 0.13
0.12
0.06
0.11
0.1
0.04
0.09
0.02 0.08
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Iteration Iteration
Segmentation −3
x 10
Thyroid
0.7 13
AdaBoost.M1 AdaBoost.M1
QuinlanAdaBoost.M1 QuinlanAdaBoost.M1
12
0.6 ConfAdaBoost.M1 ConfAdaBoost.M1
SAMME SAMME
AdaBoost.MH 11 AdaBoost.MH
0.5
10
Test error
Test error
0.4
0.3
8
0.2
7
0.1 6
0 5
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Iteration Iteration
Figure 6.2: Test error of the 5 evaluated boosting algorithms on the UCI bench-
mark datasets Pendigits, Satimage, Segmentation and Thyroid. The results are
averages over 10 test runs.
especially beneficial when the available computational resources are limited, which
is usually the case for physical activity monitoring applications.
Finally, it is worth to discuss and compare the training time required for cre-
ating the different classifiers. Building a decision tree has the time complexity of
O(DM N log(N )), where N is the number of training instances, M is the dimension
of the feature vector of a training instance, and D is the average depth of the deci-
sion tree [202]. The computational cost of AdaBoost.M1 is then O(DM N log(N )T ),
where T is the number of boosting iterations. The theoretical complexity of the al-
gorithms QuinlanAdaBoost.M1, SAMME and the newly proposed ConfAdaBoost.M1
is similar. The computational cost of AdaBoost.MH is O(DM N log(N )T C), where C
refers to the number of classes. During the experiments of this section, the train-
ing time of ConfAdaBoost.M1 was comparable to that of SAMME on all 8 evaluated
datasets. Compared to these two algorithms, the training time of AdaBoost.M1 and
QuinlanAdaBoost.M1 was almost an order of magnitude lower. This can be explained
with the early reaching of the stopping criterion, as discussed in the previous para-
graph (thus T gets smaller in the expression of O(DM N log(N )T )). On the other
6 Confidence-based Multiclass AdaBoost
Table 6.2: Comparison of the 5 evaluated boosting algorithms: test error rates [%] on the selected benchmark datasets. The
results are averaged over 10 test runs (mean and standard deviation are given), the best performance is shown for each of
the methods.
Quinlan- Conf-
Dataset AdaBoost.M1
AdaBoost.M1 AdaBoost.M1
SAMME AdaBoost.MH
Glass 26.26 ± 1.42 26.17 ± 2.60 21.12 ± 1.22 22.29 ± 1.38 22.24 ± 1.81
Iris 4.73 ± 0.73 5.00 ± 0.85 4.27 ± 0.64 4.47 ± 1.22 4.60 ± 0.80
Vehicle 24.72 ± 1.05 24.52 ± 1.10 21.75 ± 0.44 22.35 ± 1.14 23.48 ± 1.21
Letter 3.28 ± 0.14 3.19 ± 0.15 2.64 ± 0.11 2.99 ± 0.13 5.68 ± 0.39
Pendigits 3.16 ± 0.27 3.14 ± 0.45 2.51 ± 0.11 3.08 ± 0.15 2.70 ± 0.08
Satimage 10.63 ± 0.80 10.47 ± 1.01 8.07 ± 0.15 8.79 ± 0.25 9.50 ± 0.39
Segmentation 6.36 ± 1.03 6.55 ± 1.08 4.31 ± 0.20 5.92 ± 0.79 5.22 ± 0.78
Thyroid 0.61 ± 0.04 0.59 ± 0.05 0.61 ± 0.05 0.60 ± 0.06 0.64 ± 0.08
PAMAP2_AR 29.28 ± 1.40 27.90 ± 1.06 22.22 ± 0.77 27.98 ± 1.34 —
PAMAP2_IE 7.98 ± 1.04 7.73 ± 0.66 5.60 ± 0.31 7.81 ± 0.60 —
104
6.5 Evaluation on the PAMAP2 Dataset 105
hand, the training time required for AdaBoost.MH was 20 to 40 times larger than for
ConfAdaBoost.M1 on the larger datasets (e.g. Letter or Pendigits). Therefore, training
AdaBoost.MH is not feasible for extremely large datasets.
light effort (< 3.0 METs); walk, cycle, descend stairs, vacuum clean and Nordic walk
as activities of moderate effort (3.0-6.0 METs); run, ascend stairs, rope jump and play
soccer as activities of vigorous effort (> 6.0 METs).
Contrary to the 8 UCI benchmark datasets used for the experiments in the pre-
vious section, the PAMAP2 dataset does not directly provide a feature vector with
each of the instances, but provides only raw sensory data from the 3 IMUs and the
heart rate monitor. Therefore, the raw signal data needs to be processed first in or-
der to be used by classification algorithms. A data processing chain is applied on
the raw sensory data including preprocessing, segmentation and feature extraction
steps (these data processing steps are further described in Section 4.2). In total, 137
features are extracted: 133 features from IMU acceleration data (such as mean, stan-
dard deviation, energy, entropy, correlation, etc.) and 4 features from heart rate data
(mean and gradient). These extracted features serve as input to the classification step,
in which different boosting algorithms are evaluated. The main parameters of the
PAMAP2 classification tasks are summarized in Table 6.1. It is clear that, compared
to the other datasets of Table 6.1, the classification problems defined on the PAMAP2
dataset are significantly more complex, considering the number of instances and es-
pecially the number of variables. To get a first impression about the difficulty of
these tasks, experiments with a C4.5 decision tree classifier are performed: 65.79% is
reached on the PAMAP2_AR and 88.98% on the PAMAP2_IE task, averaged over 10
test runs. This result serves as baseline performance, showing that improvement is
required and to be expected while applying different boosting methods.
The experiments presented below in this section compare the newly introduced
ConfAdaBoost.M1 algorithm to the boosting methods AdaBoost.M1, QuinlanAda-
Boost.M1 and SAMME. The selection of these algorithms for comparison was already
explained in Section 6.4.1. The comparison to AdaBoost.MH is not considered here
due to the unfeasible training time it would require, given the complexity of the clas-
sification tasks and that the actual size of the training set is a multiple of that of the
other algorithms (cf. also the discussion in Section 6.4.2). Similar to the previous
section, the C4.5 decision tree classifier is used for each of the boosting algorithms
as weak learner. An important difference in the realization of the experiments in this
section is the applied evaluation technique. As discussed in Section 5.1.2, a subject in-
dependent validation technique simulates best the goals of systems and applications
using physical activity recognition. Therefore, leave-one-subject-out (LOSO) 9-fold
cross-validation is used in this section, while evaluating each method from 1 up to
500 boosting iterations.
The averaged results of the 10 test runs on the PAMAP2_AR classification task are
shown in Figure 6.3, and on the PAMAP2_IE task in Figure 6.4, respectively. The
test error rates of the 4 evaluated boosting methods are included in Table 6.2. Com-
pared to the baseline accuracy of the decision tree classifier, all boosting methods
significantly improve the performance. The ConfAdaBoost.M1 algorithm clearly out-
performs the other methods: e.g. on the PAMAP2_AR task, compared to the perfor-
6.5 Evaluation on the PAMAP2 Dataset 107
0.32
Test error
0.3
0.28
0.26
0.24
0.22
0 1 2 3
10 10 10 10
Iteration
Figure 6.3: Test error of the 4 evaluated boosting algorithms on the PAMAP2_AR
classification task. The results are averages over 10 test runs.
mance of the second best SAMME algorithm a reduction of the test error rate by nearly
20% can be observed. This reduction of the test error rate is statistically significant
with a p-value smaller than 0.001. As discussed in Section 6.4.2, it was expected that
the most significant improvement from all the datasets evaluated in this chapter is
achieved on the PAMAP2_AR classification task, since it represents the largest and
most complex classification problem.
Similar to the results of Figure 6.1 and Figure 6.2, the algorithms AdaBoost.M1
and QuinlanAdaBoost.M1 reach the stopping criterion at lower boosting iteration
numbers. However, contrary to the results of the previous section, QuinlanAda-
Boost.M1 performs significantly better here (especially on the PAMAP2_AR task),
confirming that it is even worth to apply the confidence-based modification to only
the prediction step of the original AdaBoost.M1 algorithm, as proposed in [127]. How-
ever, compared to QuinlanAdaBoost.M1, ConfAdaBoost.M1 reduces the test error
rate by 20%. Therefore, the major part of the performance improvement achieved
by ConfAdaBoost.M1 comes from the confidence-based extension of both the train-
ing and prediction step of the original AdaBoost.M1 algorithm, as also confirmed
by the results on the 8 other UCI datasets. Therefore ConfAdaBoost.M1 is clearly a
significant improvement over QuinlanAdaBoost.M1.
The typical behaviour of boosting in respect of increasing the number of boosting
iterations shows the following scheme: the performance increases and levels off at a
certain number of boosting rounds, by further increasing the iteration number the per-
formance remains at the maximum level and does not decrease, thus boosting is usu-
ally resistant to overfitting. This behaviour of boosting was the topic of many research
108 6 Confidence-based Multiclass AdaBoost
0.1
Test error
0.09
0.08
0.07
0.06
0.05
0 1 2 3
10 10 10 10
Iteration
Figure 6.4: Test error of the 4 evaluated boosting algorithms on the PAMAP2_IE
classification task. The results are averages over 10 test runs.
work in the past (e.g. in [56, 58, 110]), only a limited number of examples is known
where overfitting with boosting occurs. All the results presented on the various UCI
datasets show this advantageous behaviour. ConfAdaBoost.M1 adopts this beneficial
characteristics of boosting: it rarely overfits a classification problem. The only result
indicating overfitting is on the PAMAP2_AR task (cf. Figure 6.3): after decreasing
the test error and reaching the best performance at 30 boosting rounds, the test error
slightly increases again with increasing numbers of boosting iterations. It is an inter-
esting question why overfitting occurs here, and why only on the PAMAP2_AR task
with only the ConfAdaBoost.M1 method, which needs further investigation. Nev-
ertheless, even with higher numbers of boosting iterations (e.g. with 500 boosting
rounds) the performance of ConfAdaBoost.M1 is significantly better than that of the
other evaluated boosting methods.
To better understand the results of this section, the confusion matrix of the best
performing classifier (ConfAdaBoost.M1 with 30 boosting iterations) on the
PAMAP2_AR task is presented in Table 6.3. The numbering of the activities in the
table corresponds to the activity IDs as given in the PAMAP2 dataset. The results
are averaged over 10 test runs, the overall accuracy is 77.78%. The confusion matrix
shows that some activities are recognized with high accuracy, e.g. lie, walk or even
distinguishing between ascend and descend stairs. Misclassifications in Table 6.3
have several reasons. For example, the over 5% confusion between sit and stand
can be explained with the positioning of the sensors: an IMU on the thigh would be
needed for a reliable differentiation of these postures. Moreover, ironing has a simi-
lar characteristics from the used set of sensors’ point of view, especially compared to
6.6 Conclusion 109
Table 6.3: Confusion matrix of the PAMAP2_AR classification task using the
ConfAdaBoost.M1 classifier and 30 boosting iterations. The table shows how
different annotated activities are classified in [%].
Annotated Recognized activity
activity 1 2 3 4 5 6 7 12 13 16 17 18 19 20 24
1 lie 97.1 1.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.0 0.0
2 sit 2.0 84.8 5.4 0.0 0.0 0.5 0.0 0.0 0.0 0.1 4.1 0.6 2.5 0.0 0.0
3 stand 0.0 6.0 83.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 7.4 0.9 2.4 0.0 0.0
4 walk 0.0 0.0 0.0 92.2 0.0 0.0 0.5 6.8 0.0 0.0 0.0 0.0 0.0 0.4 0.0
5 run 0.0 0.0 0.0 0.0 89.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 10.1 0.0
6 cycle 0.0 0.0 0.0 1.1 0.0 91.7 0.4 0.5 0.0 1.3 0.1 0.0 4.9 0.0 0.0
7 Nordic walk 0.0 0.0 0.0 2.7 0.0 0.0 89.1 1.1 0.1 0.0 0.0 0.0 0.1 7.0 0.0
12 asc. stairs 0.0 0.0 0.0 6.4 0.0 0.2 0.3 87.2 2.6 0.7 0.0 0.0 0.3 2.5 0.0
13 desc. stairs 0.0 0.0 0.0 0.1 0.1 0.0 0.2 6.7 91.3 0.0 0.0 0.0 0.2 0.7 0.7
16 vacuum clean 0.0 0.0 0.1 0.0 0.0 1.1 0.0 0.3 0.4 73.5 1.3 0.3 23.1 0.0 0.0
17 iron 0.0 2.6 0.8 0.0 0.0 0.0 0.0 0.0 0.0 1.2 77.7 5.0 12.7 0.0 0.0
18 fold laundry 0.0 1.1 1.5 0.0 0.0 0.1 0.0 0.0 0.0 8.9 61.1 11.1 16.2 0.0 0.0
19 clean house 0.5 0.6 3.4 0.0 0.0 1.7 0.0 1.2 0.7 21.4 18.1 5.0 47.4 0.0 0.0
20 play soccer 0.0 0.0 0.0 5.1 27.6 1.4 2.8 7.3 20.6 1.7 0.0 0.0 0.1 20.7 12.8
24 rope jump 0.0 0.0 0.0 0.0 32.0 0.0 0.0 1.3 0.1 0.0 0.0 0.0 0.0 7.8 58.8
6.6 Conclusion
This chapter introduced a confidence-based extension of the well-known Ada-
Boost.M1 algorithm, called ConfAdaBoost.M1. The new algorithm builds on estab-
lished ideas of existing boosting methods, combining some of their benefits. The
ConfAdaBoost.M1 algorithm has been evaluated on various benchmark datasets, com-
paring it to the most commonly used boosting techniques. ConfAdaBoost.M1 per-
formed significantly best among these algorithms, especially on the larger and more
complex physical activity monitoring problems: on the PAMAP2_AR task the test
error rate was reduced by nearly 20% compared to the second best performing classi-
fier. Therefore, the main motivation of proposing this new boosting variant – namely
to overcome some of the challenges defined by recent benchmark results in physical
activity monitoring – was achieved successfully.
110 6 Confidence-based Multiclass AdaBoost
7.1 Introduction
The previous chapter introduced a novel classification algorithm with the goal to in-
crease the performance on physical activity monitoring tasks. Although the achieved
results were very promising, even the best performing classifier only achieved an
overall accuracy of 77.78% on the defined complex activity recognition task. The dis-
cussion of the respective confusion matrix given in Table 6.3 revealed that the main
reason of the remaining confusion between different activities is the diversity in how
individuals perform these activities. Therefore, to further increase the accuracy of
physical activity recognition1, this chapter introduces and investigates personaliza-
tion approaches.
1 Results presented in Section 6.5.2 show good performance on the defined intensity estimation task,
the ConfAdaBoost.M1 classifier achieved an overall accuracy of 94.40%. Therefore, from the two main
goals of physical activity monitoring systems, this chapter focuses on the task of activity recognition.
However, the here presented approaches can be also applied on the intensity estimation classification
task in a trivial way.
111
112 7 Personalization of Physical Activity Recognition
also show that the per-user accuracy can have a large variance, resulting that the per-
formance for some users is very poor.
There exist personalization approaches focusing on the feature extraction step of
the activity recognition chain (ARC, defined in [145]), e.g. by using normalized heart
rate, where normalization is done with personal information such as age or resting
heart rate (cf. [173] or Section 4.2.3). These personalized features proved to be more
valuable than absolute (unnormalized) features: They are preferably selected in the
decision nodes of trained decision tree-based classifiers, as shown in the preliminary
studies of Section 4.2.4.
However, most approaches focus on the classification step of the ARC. A common
personalization concept is to adapt the parameters of a previously trained general
model to the new user. For example, Pärkkä et al. [120] create a custom decision
tree for the recognition of 5 basic activities, and change the thresholds of the deci-
sion nodes based on labeled data from the new user. In [201] the parameters of a
decision tree are updated using the K-means algorithm with unlabeled data from the
previously unknown subject. Furthermore, Berchtold et al. [16] use fuzzy inference
system: the new user has to record 1 − 3 minutes from each activity the system rec-
ognizes, and with this data first the best classifier is selected from a set of classifiers,
and then adapted to the new user’s data.
The drawback of changing the parameters of a general model is that either the
model is simple (e.g. the decision tree classifiers in [120, 201]) and thus only low
performance can be expected on more challenging activity recognition tasks, or the
general model is complex and thus resulting in unfeasible computational costs for
mobile applications. Another personalization concept is presented in [105]: based on
the physical characteristics of the new user a subset of users is selected from a dataset
of 40 subjects, and only this subset is used to model the physical activities of the new
user. Drawbacks of this approach are that a very large original dataset is required to
cover all different types of users, and no significant difference is shown between se-
lecting users based on their physical characteristics and random selection. The reason
is that there is not necessarily a high correlation between the physical characteristics
of subjects and their movement patterns. Therefore, it is more promising to directly
use activity data for the personalization of a general model.
7.2 Algorithms
This section presents the different algorithms used for the experiments. The general
model consists of a set of S classifiers, created from the original training data. In this
chapter a single classifier corresponds to a single subject from the training dataset.
However, both the new concept of applying personalization in the decision fusion
step of the ARC and the novel algorithm based on this concept, can be used with any
set of classifiers if there is high variance between their training data.
Each classifier has the same weight in the general model: wi = 1, i = 1, . . . , S. Sec-
tion 7.2.1 presents different methods based on weighted majority voting, which can
be applied to retrain the weights of the classifiers. Moreover, Section 7.2.2 introduces
a novel algorithm to retrain the weights using new labeled samples. The baseline
performance for these approaches is given by Majority Voting (MV), thus when no
retraining of the weights is performed. For a new data instance to be classified each
of the equally weighted classifiers of the general model gives a prediction, and the re-
turning activity class is the one with the highest overall accumulated weight (in case
of multiple classes having the same highest weight a random selection is made).
tion about how well the experts perform on the new subject’s data, no assumptions
can be made about the quality of predicting the previously unknown subject’s activ-
ity labels. However, the below presented methods follow the natural goal to perform
at least nearly as well as the best expert of the general model would.
The first approach which will be used in the experiments of this chapter is the
Weighted Majority Algorithm (WMA), described by Blum [20]. In WMA, for each of
the N new training samples:
1
wi ← wi , (7.1)
2
if the ith classifier predicted the label wrong, otherwise wi remains the same. The
prediction of a new data instance is similar to MV, but using the adjusted wi weights.
Blum [20] also gives an upper bound for the M number of mistakes made by WMA:
where m is the number of mistakes made by the best expert and S is the number of
experts.
A modified version of WMA is the Randomized Weighted Majority Algorithm
(RWMA), also presented in [20]. In this algorithm
wi ← βwi (7.3)
is applied when the ith expert predicts a label wrong (a good choice for β is proposed
below). The upper bound for the M mistakes made by RWMA, dependent on the
parameters m, S and β, is the following:
m ln(1/β) + ln S
M≤ , (7.4)
1−β
proof can be found in [20]. Using this upper bound, Schapire [154] proposes to up-
date β dynamically the following way:
1
β= q , (7.5)
2 ln S
1+ m∗
where m∗ is the number of mistakes made by the best classifier while the N labeled
samples are processed sequentially. The modified wi weights are used for the predic-
tion of a new instance: the prediction of one selected classifier is used, where the ith
classifier is selected with wi /W probability, W being the sum of all weights. Although
the upper bound given for RWMA is lower than for WMA, the practical use of this
modification is questionable. RWMA suggests that, although the best expert of the
ensemble is known, this expert should only be selected sometimes while other times
one of the experts known to be worse should be relied on. Experiments presented
later in Section 7.3.2 support this statement, showing that WMA generally performs
better than RWMA.
7.2 Algorithms 115
Another approach presented here is the Weighted Majority Voting (WMV) [85].
In this algorithm a classifier’s weight only depends on its pi performance on the N
labeled samples:
wi = log10 pi /(1 − pi ) . (7.6)
The prediction with WMV is similar to the MV and WMA methods.
In the experiments of Section 7.3.2 the above described methods MV, WMA,
RWMA and WMV will be compared to each other and to the novel algorithm pre-
sented in the next subsection. Further methods exist based on the idea of weighted
majority voting, but not fulfilling all the specifications given in Section 7.1.2. For ex-
ample, Stefano et al. [164] present another version of weighting the majority vote rule.
After training a set of experts for the general model, they define the search for the op-
timal wi weights as a global problem (thus to maximize the performance of the whole
set of experts), and apply a Genetic Algorithm (GA). They performed experiments on
a handwritten digit recognition problem, showing that their proposed approach out-
performs the traditionally used weighted majority voting approach where the weights
are obtained based on only each single expert’s performance. However, Stefano et al.
[164] also state that their proposed approach requires a very high computational cost,
and is thus not feasible for online mobile activity monitoring applications.
7.3 Experiments
This section first describes the basic conditions of the experiments, including the def-
inition of activity recognition classification tasks and the decision on the used eval-
uation technique and performance measures. Afterwards, different aspects of the
suggested personalization approaches are analyzed. In a thorough evaluation of the
proposed general concept and the introduced novel DE algorithm, results are pre-
sented and discussed.
7.3 Experiments 117
F−measure with different methods on the ’basic’ task Accuracy with different methods on the ’basic’ task
90 92
89 91
88
90
87
89
F−measure
Accuracy
86
88
85
87
84
86
83
MV MV
DE 85 DE
82 WMA WMA
RWMA RWMA
WMV WMV
81 84
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Nr. of extra training samples Nr. of extra training samples
Accuracy of best subject on the ’basic’ task Accuracy of worst subject on the ’basic’ task
100.2 78
76
100
74
99.8
72
Accuracy
Accuracy
99.6
70
99.4
68
99.2
66
MV MV
99 DE DE
64
WMA WMA
RWMA RWMA
WMV WMV
62
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Nr. of extra training samples Nr. of extra training samples
Figure 7.1: Performance measures on the ‘basic’ task depending on the number
of extra training samples per each activity class. Top row: the overall perfor-
mance measures F-measure and accuracy. Bottom row: the highest and lowest
individual subject accuracy.
The experiments first specify a trade-off for the number of extra training samples re-
quired to retrain the weights in the general model. Moreover, a comparison of the
different algorithms is given, both on the ‘basic’ and ‘extended’ classification task. Fi-
nally, the practical scenario is investigated when new labeled data from only a subset
of activities is available.
F−measure with different methods on the ’extended’ task Accuracy with different methods on the ’extended’ task
70 70
65 65
F−measure
60 60
Accuracy
55 55
50 50
MV MV
DE DE
WMA WMA
RWMA RWMA
WMV WMV
45 45
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Nr. of extra training samples Nr. of extra training samples
Accuracy of best subject on the ’extended’ task Accuracy of worst subject on the ’extended’ task
95 55
50
90
45
40
85
35
Accuracy
Accuracy
80 30
25
75
20
15
70 MV MV
DE DE
WMA 10 WMA
RWMA RWMA
WMV WMV
65 5
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Nr. of extra training samples Nr. of extra training samples
Figure 7.2: Performance measures on the ‘extended’ task depending on the num-
ber of extra training samples per each activity class. Top row: the overall perfor-
mance measures F-measure and accuracy. Bottom row: the highest and lowest
individual subject accuracy.
without retraining the general model. The results with the different weighted major-
ity voting based algorithms suggest that the major increase in overall performance
(cf. the plots of overall F-measure and accuracy) is achieved with just 10 extra train-
ing samples per activity. However, the lowest individual accuracy (cf. bottom right
plot in both figures) significantly increases with more new training data, thus it is
worth to select a higher number of extra training samples. On the other hand, the
more new training samples are required the longer it takes to record labeled data and
to retrain the weights for the new subject. Therefore, 60 extra training samples per
activity are selected as trade-off, and used in the rest of this chapter. This means that
the new subject has to record 1 minute of labeled data from each activity, since the
applied data processing chain uses a sliding window of 1 second. The length of this
additional data recording from the new user is comparable or even less than required
in various previous works, as presented in Section 7.1.1.
120 7 Personalization of Physical Activity Recognition
AdaBoost.M1 classifier on the ’basic’ task with MV AdaBoost.M1 classifier on the ’basic’ task with DE
89 93
88
92
87
Performance measures
Performance measures
91
86
90
85
84
89
83
88
82
Precision Precision
87
81
Recall Recall
F−measure F−measure
Accuracy Accuracy
80 86
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Nr. of boosting iterations Nr. of boosting iterations
AdaBoost.M1 classifier on the ’extended’ task with MV AdaBoost.M1 classifier on the ’extended’ task with DE
66 78
64
76
62
Performance measures
Performance measures
74
60
58
72
56
70
54
52
68
50
Precision Precision
66
Recall Recall
48
F−measure F−measure
Accuracy Accuracy
46 64
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Nr. of boosting iterations Nr. of boosting iterations
Table 7.2: Performance measures on the ‘basic’ task with AdaBoost.M1 classifier (100 boosting iterations) and 60 extra
training samples per activity class. The results are averaged over 10 test runs, mean and standard deviation is given for each
experimental setup.
Precision Recall F-measure Accuracy Best subject Worst subject
MV 87.67 ± 1.45 80.77 ± 2.15 84.07 ± 1.73 86.20 ± 1.66 99.35 ± 1.51 66.25 ± 6.36
WMV 89.77 ± 5.05 86.71 ± 3.68 88.20 ± 4.27 89.91 ± 4.07 99.59 ± 0.44 71.65 ± 25.44
WMA 89.42 ± 3.52 85.32 ± 2.96 87.31 ± 3.01 89.33 ± 2.52 99.61 ± 0.35 77.56 ± 8.34
RWMA 87.11 ± 4.07 82.82 ± 4.76 84.87 ± 4.10 87.85 ± 2.98 99.01 ± 1.21 75.00 ± 8.08
DE 92.06 ± 1.46 87.00 ± 2.53 89.46 ± 1.98 91.07 ± 1.69 99.96 ± 0.07 76.91 ± 5.28
121
7 Personalization of Physical Activity Recognition
Table 7.3: Performance measures on the ‘extended’ task with decision tree classifier and 60 extra training samples per
activity class. The results are averaged over 10 test runs, mean and standard deviation is given for each experimental setup.
Precision Recall F-measure Accuracy Best subject Worst subject
MV 51.11 ± 1.50 48.46 ± 1.16 49.74 ± 1.24 56.36 ± 1.48 80.97 ± 5.10 8.19 ± 3.66
WMV 58.88 ± 2.20 54.47 ± 1.38 56.58 ± 1.48 59.69 ± 1.46 86.47 ± 3.29 35.30 ± 9.02
WMA 52.93 ± 2.66 51.50 ± 2.66 52.17 ± 2.27 54.02 ± 1.41 75.84 ± 5.94 39.69 ± 6.02
RWMA 48.99 ± 2.61 47.78 ± 3.60 48.35 ± 2.97 50.83 ± 2.43 71.50 ± 3.76 35.21 ± 11.54
DE 68.21 ± 1.70 64.80 ± 1.55 66.46 ± 1.54 67.78 ± 1.59 90.56 ± 1.65 50.67 ± 5.20
Table 7.4: Performance measures on the ‘extended’ task with AdaBoost.M1 classifier (100 boosting iterations) and 60 extra
training samples per activity class. The results are averaged over 10 test runs, mean and standard deviation is given for each
experimental setup.
Precision Recall F-measure Accuracy Best subject Worst subject
MV 62.62 ± 1.06 59.61 ± 1.24 61.08 ± 1.07 65.32 ± 1.10 91.70 ± 1.64 18.55 ± 2.70
WMV 47.90 ± 3.12 55.00 ± 3.30 51.20 ± 3.18 54.58 ± 3.37 96.79 ± 0.54 15.22 ± 9.82
WMA 73.11 ± 1.99 67.19 ± 2.35 70.02 ± 2.10 68.13 ± 1.56 89.95 ± 4.01 53.62 ± 2.84
RWMA 69.36 ± 4.23 64.37 ± 2.35 66.74 ± 2.83 65.71 ± 1.43 91.44 ± 2.65 54.14 ± 8.87
DE 76.83 ± 1.09 72.14 ± 1.22 74.41 ± 1.14 74.79 ± 1.13 96.03 ± 1.25 56.14 ± 2.80
122
7.4 Computational Complexity 123
(bottom row, left and right plots presenting again the results with the MV and DE
algorithm, respectively): increasing the number of boosting iterations significantly
improves on the different performance measures. It should be noted that the results
of AdaBoost.M1 in Table 7.2 and Table 7.4 are all given with 100 boosting rounds, as
the performance already levels off at this iteration number.
From the presented results it is clear that the concept of retraining the weights
of a general model is a valid approach for the personalization of physical activity
recognition. Compared to the baseline performance of MV, the overall performance
measures of all weighted majority voting algorithms are at least comparable, while
the lowest individual performance increases significantly. This is true with the meth-
ods WMA and RWMA, but overall the WMV algorithm performs best amongst the
existing approaches. Moreover, the novel DE algorithm clearly outperforms all exist-
ing methods, both in overall performance and in increasing the worst subject’s perfor-
mance. Therefore, this new method is a very promising approach for personalization.
F−measure with DT classifier on the ’basic’ task Accuracy with DT classifier on the ’basic’ task
90 95
90
85
85
F−measure
80
Accuracy
80
75
75
70
70
DE DE
WMA WMA
RWMA RWMA
WMV WMV
65 65
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Nr. of retraining activities Nr. of retraining activities
Accuracy of best subject with DT classifier on the ’basic’ task Accuracy of worst subject with DT classifier on the ’basic’ task
100 80
98
75
96
70
Accuracy
Accuracy
94
65
92
60
90
DE DE
WMA WMA
RWMA RWMA
WMV WMV
88 55
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Nr. of retraining activities Nr. of retraining activities
Figure 7.4: Performance measures on the ‘basic’ task depending on the number
of activities (0 − 6) from which new data is available to retrain the general model.
Top row: the overall performance measures F-measure and accuracy. Bottom
row: the highest and lowest individual subject accuracy.
ing its computational complexity. The results presented in Section 7.3 show that the
personalization of complex classifiers (AdaBoost.M1 as an example) is possible with
both the new concept and the novel DE algorithm. Therefore, this section will analyze
the computational complexity of the proposed methods on mobile systems.
An empirical study is designed and carried out to investigate the feasibility of the
new personalization approach for mobile physical activity recognition. This study
is performed using the mobile system described in Section 8.3.1, thus using wear-
able wireless sensors and a Samsung Galaxy S III smartphone (this device contains
a 1.4 GHz quad-core Cortex-A9 CPU and 1 GB of RAM). The procedure of the empir-
ical study can be described as follows. First, data is recorded from one subject dur-
ing 6 sessions, each session including the following 7 activities: lying, sitting, stand-
ing, walking, running, ascending and descending stairs. These recordings were used
to create the general model, consisting of 6 classifiers (each of these classifiers was
trained using data from one of the sessions). Decision tree, AdaBoost.M1 and Conf-
7.4 Computational Complexity 125
F−measure with DT classifier on the ’extended’ task Accuracy with DT classifier on the ’extended’ task
70 70
65
65
60
60
F−measure
Accuracy
55
55
50
50
45
45
40 DE DE
WMA WMA
RWMA RWMA
WMV WMV
35 40
0 5 10 15 0 5 10 15
Nr. of retraining activities Nr. of retraining activities
Accuracy of best subject with DT classifier on the ’extended’ task Accuracy of worst subject with DT classifier on the ’extended’ task
95 55
50
90
45
85
40
80
35
Accuracy
Accuracy
75 30
25
70
20
65
15
DE DE
60
WMA 10 WMA
RWMA RWMA
WMV WMV
55 5
0 5 10 15 0 5 10 15
Nr. of retraining activities Nr. of retraining activities
Figure 7.5: Performance measures on the ‘extended’ task depending on the num-
ber of activities (0 − 15) from which new data is available to retrain the general
model. Top row: the overall performance measures F-measure and accuracy.
Bottom row: the highest and lowest individual subject accuracy.
AdaBoost.M1 (both with decision tree as base-level classifier) were used and com-
pared as classifiers in the general model. Then, labeled data from a second subject is
recorded, while performing each of the 7 activities for approximately one minute (as
defined in Section 7.3.2). This new data is used to retrain the weights of the 6 classi-
fiers of the general model. The retraining of the weights is performed directly on the
smartphone, for each of the 4 analyzed algorithms. For each classifier – personaliza-
tion algorithm pair the retraining was run 5 times, results will present the average of
these test runs. Finally, to compare the performance of the different methods using
the mobile system, the second subject also recorded data for offline evaluation, per-
forming each of the 7 activities for approximately three minutes. However, it should
be noted that the main goal of this empirical study was not to compare classification
accuracy of the methods, but to analyze and compare the computational time of the
proposed algorithms.
126 7 Personalization of Physical Activity Recognition
Table 7.5: Computational time [s] required for the retraining of the general
model. Decision tree, AdaBoost.M1 and ConfAdaBoost.M1 are each tested as
classifiers used in the general model. The proposed personalization approach is
evaluated with the weighted majority voting-based methods WMV, WMA and
RWMA, and the novel DE algorithm.
Classifier WMV WMA RWMA DE
Decision tree 4.01 3.89 4.01 4.05
AdaBoost.M1 10.91 11.03 10.82 10.84
ConfAdaBoost.M1 30.89 30.48 31.01 31.00
Table 7.6: Classification accuracy [%] of each of the majority voting-based algo-
rithms and each type of classifier applied in the general model.
Classifier MV WMV WMA RWMA DE
Decision tree 84.64 87.61 92.44 92.44 88.54
AdaBoost.M1 80.14 86.08 84.38 84.38 85.74
ConfAdaBoost.M1 84.97 92.70 92.28 92.28 92.19
Table 7.5 presents the average retraining time of each of the weighted majority
voting algorithms and each type of classifier. The interpretation of these results is
the following: After the second subject recorded the required new training data and
started the retraining of the general model on the smartphone, how long did he have
to wait to receive his personalized model. For each of the retraining algorithms, the
major computational time is spent to predict the label of each new sample by each
of the general model’s classifier (for which the by far most computationally intensive
part is the feature calculation in the DPC). With an effective implementation this has
to be done exactly once for each sample – classifier pair, even when applying the DE
algorithm. Therefore, the retraining time for all 4 algorithms should be similar, as
proved by the results of Table 7.5. Furthermore, the retraining of the general model
when consisting of ConfAdaBoost.M1 classifiers takes the longest, since this classifier
is the most complex, thus includes the calculation of the most features. Nevertheless,
the retraining time of approximately 30 seconds is still acceptable: The new user
receives a complex personalized system after only waiting half a minute, which is far
below the required time presented in related work.
Table 7.6 shows the classification accuracy for each of the majority voting algo-
rithms and each type of classifier. These results were achieved with the different per-
sonalized models, on the data recorded by the second subject for offline evaluation
purposes (approximately three minutes for each of the 7 activities). Since the amount
of data used for this evaluation is rather small, no statistically significant conclusion
can be drawn. Nevertheless, these results serve as proof of concept, showing that the
novel personalization concept is realized and successfully trained on the proposed
mobile system. The accuracy of the general model’s single classifiers in case of using
decision tree classifier ranges between 45.08% and 92.44%, in case of AdaBoost.M1
it ranges between 45.08% and 90.49%, and in case of ConfAdaBoost.M1 it ranges
7.5 Conclusion 127
7.5 Conclusion
This chapter presented a novel general concept for the personalization of physical ac-
tivity recognition applications. This concept uses a set of classifiers as general model,
and retrains the weight of the classifiers using new labeled data from a previously
unknown subject. Results with different methods based on this concept (using WMA,
RWMA and WMV algorithms) show that it is a valid approach. Moreover, a novel
algorithm is presented and compared to the existing methods, further increasing the
performance of the personalized system. These statements are confirmed with a thor-
ough evaluation on two activity recognition classification tasks, comparing also deci-
sion tree and boosted decision tree classifiers as experts used in the general model.
The main benefit of the introduced concept is that, instead of retraining classi-
fier(s) of a general model, only their weights are retrained. This is much less compu-
tationally intensive, since basically only the prediction of the new training samples is
required. Therefore, the proposed approach can also be used for mobile systems, even
for complex classification tasks requiring more complex classifiers (cf. the ‘extended’
task with AdaBoost.M1 classifier). An analysis of the computational complexity of
the new personalization concept shows its feasibility for online mobile applications.
A new user receives the personalized model within a short time. Moreover, the pro-
posed concept allows that the new user only records data from a subset of the recog-
nized activities, making the approach more practicable.
Physical activity monitoring systems are usually trained on a user group of young,
healthy adults. Without applying personalization to such applications, they only per-
form poorly when used by significantly differing users, e.g. overweight or elderly
subjects. In future work it is planned to investigate how well personalized systems
perform in these situations, how much improvement the proposed personalization
concept and novel DE algorithm achieves compared to when only applying a general
model.
Physical Activity Monitoring Systems
8
8.1 Introduction
The previous chapters presented novel algorithms in order to improve the classifi-
cation performance of physical activity monitoring systems. This chapter describes
how an actual activity monitoring system can be created, various issues concerning
such systems are discussed.
The main goal of this chapter is to create a mobile, unobtrusive activity moni-
toring system. For collecting inertial and physiological data small, lightweight and
wireless sensor units should be used, as described in Section 8.3.1. Moreover, as few
sensor positions as possible should be utilized on the user’s body. A thorough analysis
concerning this issue for both intensity estimation and activity recognition is given in
Section 8.2. For processing the collected data current smartphones are chosen for the
final prototype, as discussed in Section 8.3.1. Two feasibility studies are carried out
in Section 8.3.2 to show that the computational power of such mobile devices is suffi-
cient for applying complex classification algorithms. Moreover, the mobile device is
also used for providing the user with feedback , visualizing the results as described
in Section 8.3.3.
A further goal of this chapter is the integration of the described mobile system
into a full healthcare application for aerobic activity monitoring and support in daily
life. The major components of such an overall system and how they interact with each
other are presented in Section 8.4. Finally, the chapter is summarized in Section 8.5.
129
130 8 Physical Activity Monitoring Systems
Table 8.1: Modular activity monitoring system: results on the intensity estima-
tion task with different combinations of the sensors.
chest IMU arm IMU foot IMU heart rate Performance [%]
X 90.47
X 86.47
X 88.08
X 82.06
X X 94.37
X X 93.07
X X 91.36
X X X 94.07
X X X X 95.65
racy of 95% with LOSO protocol using 5 sensor positions, while with only 2 sensors
(wrist and ankle location) the accuracy still reached 88%.
In this thesis both intensity estimation and activity recognition are considered.
Results achieved on these tasks with different combinations of sensors are presented
in Section 8.2.1 and Section 8.2.2, respectively. Moreover, the results motivate to in-
troduce the concept of a modular activity monitoring system. As a concrete example
a system consisting of three modules is described here. By using different combina-
tions of these modules, different functionality becomes available: 1) a coarse intensity
estimation of physical activities 2) different features based on heart rate data and 3)
the recognition of basic activities and postures.
All below presented results are based on the PAMAP dataset, which provides data
from three IMUs (located at a subject’s arm, chest and foot) and a heart rate monitor,
cf. Section 3.2. The data processing chain as defined in Section 4.2 is used. Boosted
decision tree classifier is applied in this chapter since it performed best in the prelim-
inary studies of Section 4.2.4. Furthermore, LOSO 8-fold cross-validation protocol is
applied. All experiments are performed using the Weka toolkit [65].
Table 8.2: Modular activity monitoring system: results on the activity recogni-
tion task with different combinations of the sensors.
chest IMU arm IMU foot IMU heart rate Performance [%]
X 83.36
X 73.55
X 74.67
X 45.64
X X 83.85
X X 77.55
X X 76.45
X X X 88.11
X X X 81.70
X X X 89.95
X X X 88.90
X X X X 90.65
(on arm and foot placement) further improvement on intensity estimation is obtained.
However, it is questionable whether it is worth using two extra sensors for only a mi-
nor improvement in performance. On the other hand, if the two extra accelerometers
are required for other tasks in an activity monitoring system (e.g. for activity recogni-
tion), features derived from them are used for intensity estimation as well. Moreover,
features combining different sensor locations (e.g. the weighted sum of the absolute
integral of all three accelerometers, cf. Section 4.2.3) are applied as well. Therefore, if
synchronized data from different sensor placements is available, it is worth extracting
and investigating features computed from multiple sensor locations for the intensity
estimation task.
The results in Table 8.1 also indicate that – in contrast to the conclusion of [173] –
heart rate information combined with accelerometers improves the intensity estima-
tion of physical activities, compared to systems only relying on inertial data. This is
especially true for walking-like activities of light/vigorous effort. Without using the
HR-monitor, the performance of the intensity estimation is poor on the activities very
slow walk and ascend stairs. The reason is that the characteristics of these activities
overlap with normal walk if only considering features extracted from accelerometer
data. This justifies the need of features extracted from physiological measurements,
e.g. from heart rate data. However, the results of Table 8.1 also show that heart rate
information alone is not sufficient for a reliable intensity estimation.
the usage of the two extra accelerometers is justified: The overall performance can be
increased to 90.65%.
An interesting conclusion from the results of Table 8.2 (from the performance
results on the setups containing two IMUs and the HR-monitor) is that the chest and
foot IMU placements behave similarly for activity recognition, while the arm IMU
placement is complementary. Comparing the activity type of misclassified samples
with and without using the arm IMU reveals that distinguishing normal walk and
Nordic walk is effectively not possible without using the arm IMU.
8.2.3 Conclusion
With recent progress in wearable sensing the number of commercially available ac-
tivity monitoring products is increasing. Most of these products include one sensor,
located on the user’s body (e.g. as a bracelet, on the belt or directly integrated in a
mobile device), and focus on a few goals usually related to the assessment of energy
expenditure. Studies underline the good accuracy of some of these systems, e.g. the
Actiheart [36] or the SenseWear [82] system. However, there exist different needs to-
wards an activity monitoring system. Additional functionality is introduced in some
of the above mentioned products, e.g. the assessment of sleep duration and efficiency
in the SenseWear system. However, there is no possibility to extend these systems if
e.g. a higher accuracy or more information is required for adding further functional-
ity related to physical activity monitoring.
The results presented in Section 8.2.1 and Section 8.2.2 indicate that a different
set of sensors is required for different physical activity monitoring tasks. This mo-
tivates the idea of introducing a modular activity monitoring system: By adding or
removing sensors different functionality can be added or removed. The rest of this
section describes an extensible physical activity monitoring system based on this idea:
Given a simple system for the intensity estimation of physical activities, a more de-
tailed description of daily activities can be acquired with one or two additional set of
sensors.
The basic system consists of only one accelerometer worn on the chest. This deliv-
ers a reliable coarse intensity estimation of physical activities, cf. Table 8.1. By adding
a heart rate monitor, the following benefits can be achieved compared to the basic
system: 1) a significantly improved intensity estimation and 2) new functionality is
available based on the obtained heart rate information. The first benefit is justified by
the results of Table 8.1, since the performance increased by approximately 4% with
the additional heart rate monitor (from 90.47% to 94.37%). As for the second benefit:
Monitored heart rate can extend the functionality of an activity monitoring system
in many ways. For cardiac patients for example, a specific HR could be defined indi-
vidually in the system, and an alarm would be initiated when exceeding this value.
For sports applications, a desired range of HR can be defined, and the system can
determine how much time was spent in this heart rate zone to optimize the benefits
from a workout.
Finally, by adding two extra accelerometers (arm and foot placement) to the basic
or the HR-monitor extended system – besides a further improvement on the intensity
8.3 Mobile Activity Monitoring Systems 133
estimation – the recognition of basic activities and postures is enabled. This module
is justified by the results shown in Table 8.2: an accuracy of 88.90% or 90.65% was
achieved on an activity recognition task with the 3 IMUs or the 3 IMUs and the heart
rate monitor, respectively.
As a result, the idea of a modular system for physical activity monitoring was pre-
sented within this section: a base module is responsible for the basic system function-
ality (intensity estimation in the concrete example of this section), while two more
modules can be added – separately or together – to extend the functionality of the
system. Following the idea of modularity, additional modules could be defined. A
possible extension to the presented system is e.g. a module providing full upper-body
tracking. Therefore, besides the already provided monitoring of aerobic activities, the
monitoring of muscle-strengthening activities would become available.
Overall, although the PAMAP2 prototype was in general feasible for data collec-
tion, some drawbacks still limit the system’s usability in everyday life. Therefore,
further improvement is required, concerning especially the collection unit and the
size of the sensor units. The next subsection presents a state-of-the-art prototype for
mobile physical activity monitoring. It is based on commercially available, widely
used sensors and an Android smartphone, making it more acceptable for everyday
usage.
As for mobile control unit, a general Android smartphone is proposed for the final
prototype. The main tasks of the control unit are data processing and visualization of
the results. The choice of a smartphone for these tasks is preferable, since this way no
additional device is required as control unit, most users would anyhow carry a smart-
phone with themselves during their daily routine. Moreover, these devices support
wireless data transfer by Bluetooth, thus the selected sensors (Shimmer units and the
Zephyr heart rate monitor) can directly stream the collected data to the control unit
for online processing, no additional hardware component (e.g. dongle) is required.
The Android operating system was selected for its comfortable way of developing ap-
plications, good support for external devices (cf. e.g. the Android instrument driver
from Shimmer) and its large community of developers. Two Android smartphones
were tested within the final prototype, namely the Google Nexus S (available since
December 2010, including a 1GHz single-core ARM Cortex-A8 CPU and 512MB of
RAM) and the Samsung Galaxy S III (available since May 2012, including a 1.4GHz
quad-core Cortex-A9 CPU and 1GB of RAM).
The entire data processing chain (cf. Section 4.2) is implemented in Java for the
Android smartphones, resulting in an online application for long-term physical activ-
ity monitoring. The feature extraction step is optimised in a way that each feature
should be computed at most once on each window segment. As for the classification
step, the ConfAdaBoost.M1 algorithm with C4.5 decision tree as weak learner is cho-
sen for the implementation, since this is the best performing classifier throughout
this thesis (cf. Chapter 6). Feasibility studies of applying such complex classifiers on
mobile devices are carried out in Section 8.3.2.
Both intensity estimation and activity recognition are included in the mobile ap-
plication. For both tasks the definition as presented in Section 4.4.1 is used (the
background task for activity recognition), thus 3 intensity and 7 activity classes are
to be distinguished. For dealing with the other activities in the activity recognition
task the ‘bgClass’ model is used, as this has the best generalization characteristics (cf.
Chapter 5). The boosted decision tree classifiers for both the intensity estimation and
activity recognition tasks were trained using the PAMAP2 dataset.
Apart from the implemented data processing chain, the mobile application of the
final prototype also provides a graphical user interface (GUI) for the user. This in-
cludes on the one hand a labeling tool similar to the one presented in Section 3.3.1.
Therefore, further data collection can be performed with the proposed final proto-
type, offering a robust and unobtrusive system for this purpose. On the other hand,
the GUI also provides feedback to the user, visualizing the results on the smartphone’s
display. The type of feedback is described in detail in Section 8.3.3, providing also
visualization examples of the mobile application.
The best performing classifiers throughout this thesis were the different boosted de-
cision tree classifiers. They have several further benefits, e.g. they possess a simple
structure (it can be basically described as a large if-then-else structure), and are thus
easy to implement. However, boosting is complex in the way that the size of the clas-
136 8 Physical Activity Monitoring Systems
Table 8.3: Feasibility study I: comparing size and average computational cost
of classification of the three different decision tree (DT) classifiers on a mobile
device (Viliv S5 UMPC).
Classifier Size No. of leaves Computation time (ms) No. of features
Custom DT 15 8 4.9 3.57
C4.5 DT 119 60 23.6 8.82
Boosted DT 1464 737 184.3 79.78
sifier is about T-times larger (T being the number of boosting iterations) than the
applied base-level classifier. This means that boosted classifiers have larger computa-
tional requirements. Therefore, the question arises whether such complex classifiers
are feasible for online activity monitoring applications: Due to the mobile systems
these applications are running a restriction on available computational power exist.
This subsection describes two empirical studies performed to examine this question.
Study I: AdaBoost.M1
The first feasibility study compares AdaBoost.M1 (with C4.5 decision tree as weak
classifier) to two other decision tree classifiers: a custom decision tree classifier (cf.
Figure 4.4) and a C4.5 decision tree classifier. The two latter classifiers were selected
since each represents a different complexity level and all three classifiers have a bi-
nary tree structure, thus a comparison of them is straightforward. All three classifiers
were introduced in Section 4.2.4.
All three classifiers to be compared were implemented in C++ on a Viliv S5 UMPC
(the control unit used for the PAMAP2 data collection, cf. Section 3.3.1), containing
an Intel Atom Z520 CPU (1.33GHz) and 1GB of RAM. The structure of the imple-
mentation includes a data collection thread (including preprocessing and segmenta-
tion of raw sensory data) for each of the sensors, and a data processing thread for
feature extraction and classification. The training of each of the three classifiers was
done offline using the PAMAP dataset. Then, the trained binary tree structures were
converted into C++ code for online classification on the UMPC. With each of the classi-
fiers an approximately 15 minutes protocol was followed wearing the mobile system.
This protocol included a wide range of activities (lying, sitting, standing, walking,
running, ascending stairs and descending stairs) to be able to observe the classifiers
in different states of their function.
Table 8.3 shows the comparison of size and average computational cost of the
three different decision tree classifiers. The size of the classifiers in the table refers to
the number of decision and leaf nodes together. The computation time includes the
classification and the computation of the required features for the respective classifi-
cation step, and was computed in the above mentioned data processing thread of the
online application. The number of features in the table refers to the average number
of computed features per window segment. It should be noted that the above men-
tioned optimization of the feature extraction step (computing each feature at most
once on each window segment) is applied here.
8.3 Mobile Activity Monitoring Systems 137
Table 8.4: Feasibility study II: comparing computational cost and performance
of the three different C4.5 decision tree (DT) based classifiers on a smartphone
(Samsung Galaxy S III).
Computation time (ms) Accuracy [%]
Classifier average maximum intensity est. activity rec.
C4.5 DT 3.32 42 94.36 93.84
AdaBoost.M1 24.54 131 96.18 98.98
ConfAdaBoost.M1 51.54 150 99.79 100
Although the results in Table 8.3 show that the computational cost of the boosted
decision tree classifier is an entire order of magnitude higher than the computational
cost of the C4.5 decision tree classifier, it is still far below the restriction given by the
application. This restriction is defined by the fact that the segmentation step of the
DPC uses a sliding window shifted by 1 second, thus the data processing thread has
maximum 1 second for each processing step. Therefore, this empirical study showed
that the more complex boosted decision tree classifier is a considerable choice even
for mobile activity monitoring applications, there are no limitations considering the
computational costs.
Apart from proving the feasibility of using ConfAdaBoost.M1 for online activity
monitoring on smartphones, this second study serves also as proof of concept: The
proposed final prototype of the mobile system was fully realized and tested. Both
the provided labeling tool and the implemented data processing chain were working
well, thus overall a fully functional, robust and unobtrusive mobile physical activity
monitoring system was realized in this section. Moreover, the comparison of the
three classifier’s performance in Table 8.4 confirms previous results of this thesis: The
ConfAdaBoost.M1 algorithm outperforms the other classifiers on both the intensity
estimation and the activity recognition tasks. Although these results were achieved
with data from only two subjects and the results are subject dependent, they show a
clear tendency and thus justify applying ConfAdaBoost.M1 in the implemented data
processing chain.
The previous chapters of this thesis presented various methods for physical activity
monitoring, while the previous sections of this chapter described the creation of a
modular, mobile activity monitoring system. However, all these efforts would be au-
totelic without giving feedback to the user, without visualizing the results of data
processing and classification. Online visualization in activity monitoring applica-
tions is important to help the user to reflect on the results and gain insights about
his behaviour, which then could encourage to continue or do even more physical ac-
tivity. Therefore, this subsection investigates the question how to provide the user
with understandable, helpful and motivating feedback. Example snapshots from the
realized mobile systems are shown, visualizing results of activity recognition, inten-
sity estimation and various heart rate-related features.
The most common visualization tools used to represent results of activity moni-
toring are charts (e.g. bars or lines) to show the time spent performing different recog-
nized classes [18, 134]. Another way of representation is based on living metaphors,
e.g. using a fish to look happy or sad depending on how far the user met the activity
goals [97]. Moreover, Fan et al. [46] introduced the visualization tool Spark: They
display activity data by using circles of different colour and size animated in various
ways. In a field study they found that such abstract visual rewards encourage some
of the test subjects to be more active as usually. A further common motivational tool
is to give trophies when the user accomplishes a certain goal, e.g. 10.000 steps made
a day (cf. the commercially available product Fitbit [50]).
Since the main goals of this thesis are not related to the visualization of the results,
rather simple methods were implemented in the different prototypes of the mobile ac-
tivity monitoring system to give feedback to the user. This feedback visualizes results
of data processing and classification: what activity the user performed, for how long
and with what intensity. Figure 8.1 shows the activity summary as given in the mo-
bile application implemented on the Viliv S5 UMPC, the control unit used to record
the PAMAP2 dataset. From this display the user can see a summary of his performed
activities of the current day. Figure 8.1 visualizes the results of a nearly three hour ses-
sion, the activity recognition and intensity estimation tasks both include the classes
8.4 Integrated Activity Monitoring System 139
Figure 8.1: Visualization of the data processing and classification results: Exam-
ple of the activity summary on the Viliv S5 UMPC.
as defined before (the icon with the question mark refers to the background activ-
ity class). With this online feedback the user can access his progress anywhere and at
anytime, thus getting informed about e.g. how much more physical activity he should
perform to reach the general recommendations of Haskell et al. [66]. The GUI of the
final prototype of the mobile system includes a similar visualization tool. Figure 8.2
shows an example snapshot from this system, taken from the Samsung Galaxy S III
smartphone.
As discussed in Section 8.2, due to the available heart rate data, additional in-
formation can be displayed for the user. On the one hand, for cardiac patients for
example, a specific heart rate can be defined individually in the system, and an alarm
is initiated when exceeding this value. On the other hand, a summary of how much
time the user spent in different heart rate zones – these zones are based on the propo-
sition of Fox et al. [52] and are widely used in sport applications – can be provided as
well. Figure 8.3 shows this feedback given to the user in visualized form on the Viliv
control unit. This heart rate summary was taken from the same session as Figure 8.1,
summarizing how much time the user spent in different heart rate zones.
Figure 8.2: Visualization of the data processing and classification results: Exam-
ple of the activity summary on the Samsung Galaxy S III smartphone.
Figure 8.3: Visualization of the features derived from the recorded heart rate
information: Example of the heart rate summary on the Viliv S5 UMPC.
8.4 Integrated Activity Monitoring System 141
the monitoring of aerobic activities, with the following two goals: 1) estimating the
intensity of performed activities to be able to answer how far a patient meets the gen-
eral recommendations of [66], or the goals defined in a care plan by the clinician and
2) recognizing aerobic activities traditionally recommended to give a more detailed
description of a patient’s daily routine. In this section the above described mobile
system is integrated into a healthcare system supporting out-of-hospital services.
Various online applications for physical activity monitoring were already pre-
sented in related work (cf. e.g. [42]), also providing feedback to the user to preserve
motivation (e.g. in [18]). However, the described overall system in this section is the
first attempt of completely integrating such mobile systems into a professional health-
care system. The integration of the mobile platform with an Electronic Health Record
(EHR) has many benefits. For example, it provides access for both the clinician (e.g.
to enter a patient’s medical record or to set up a care plan in the EHR) and the patient
(e.g. to watch assigned educational material). It also provides valuable information
to the clinical personnel to supervise program adherence and follow the patient’s re-
habilitation progress on daily basis. Moreover, feedback is also given to the patient
about his daily progress, preserving or even increasing his motivation to follow the
defined care plan.
The integration of the mobile activity monitoring system with an EHR is realized
during the PAMAP project [116], within the aerobic activity monitoring use case. Fig-
ure 8.4 shows the major components and their interaction in the proposed overall
system for aerobic activity monitoring and support in daily life. The EHR serves for
collection and management of information (related to the medical profile and history
of the monitored subject, and to the collected activity information), and is further de-
scribed in the next subsection. The main purpose of the mobile platform is the mon-
itoring of the user’s daily activities by collecting and processing sensory data, but it
also gives an instant feedback to the user. This mobile system was described in detail
in Section 8.3. The Clinician’s WEB Interface provides a web based user interface for
the physicians to the EHR. It enables the clinician to view and edit the medical record
of the monitored subject (cf. Section 8.4.2), to define a personal program of aerobic
activities for the subjects on daily basis, to define and upload educational material for
each of his patients individually, and to view a summary of the patient’s performed
activities over a specific day (cf. Section 8.4.3). The Individual’s Interactive TV (i-TV)
interface provides the monitored subjects with the means to use the system’s services
that are offered to them. Specifically, the patient can view his own subset of the EHR,
can view educational material (e.g. watch short videos) his clinician assigned to him,
and can see the defined program of aerobic activities for the current day. The i-TV
provides hereby a convenient interface even for subjects – especially for elderly – who
are not very familiar with computers, since it can be controlled with a standard TV
remote control.
A typical scenario of using the system by the clinician and a new patient – inter-
acting with the different components of the system – is described in the following
142 8 Physical Activity Monitoring Systems
Figure 8.4: The integration of the mobile physical activity monitoring system
into a complete healthcare system. The figure shows the major components and
their interaction for aerobic activity monitoring and support in daily life.
steps (steps 1-3 are only carried out at the setup for a new patient, while steps 4-8
are performed every day, the numbers in the major components of Figure 8.4 refer to
these steps):
1. The clinician adds (registers) the new patient in the EHR, and enters informa-
tion about the new patient, related to his medical profile and history.
2. The clinician draws up the care plan to be followed by the patient, and enters
it into the EHR. This care plan includes a set of measurements to be performed
periodically, a set of questionnaires to be filled out, a set of educational material
to inform the patient, etc. The care plan also defines a personal program of
aerobic activities to be followed by the new patient.
3. The clinician downloads basic personal information (age, resting heart rate, etc.)
of the new patient into the mobile platform (before first used by the patient),
using the mobile application (done automatically after corresponding button
pressed). This personal data is used for the computation of personalized fea-
tures (cf. Section 4.2.3), and defines parameters for the heart rate summary
screen (HR-zones and maximum HR, cf. Section 8.3.3 and Figure 8.3).
4. At home, the patient informs himself in the morning about the current day’s
assigned activity program, using the i-TV interface.
8.4 Integrated Activity Monitoring System 143
5. The patient wears the mobile platform (cf. Section 8.3) over the active part of
the day. The mobile application records and processes the sensory data.
6. The patient can look at any time at the mobile application to see his progress.
The mobile application’s GUI gives feedback for the monitored subject, as de-
scribed in Section 8.3.3.
7. At the end of the day, the patient uploads the result of the activity monitoring to
the EHR, using the mobile application (done automatically after corresponding
button pressed).
8. The clinician can look at the patient’s daily progress using the visualization
provided in the web interface of the EHR (cf. Section 8.4.3), thus supervising
how far the patient followed the defined program. If not sufficiently, or the
program has to be readjusted, the clinician can contact his patient.
Figure 8.5: Example screen of the Electronic Health Record: health related habits
of living.
EHR (step 7 in the above described scenario), or personal information can be queried
by the mobile application from the EHR (step 3 in the scenario). Both of these actions
are predefined tasks in the mobile application and can therefore be easily executed
by pressing the corresponding button.
Figure 8.6: The clinician’s feedback about the patient’s daily progress, as pro-
vided in the integrated system: Example activity summary as shown in the EHR’s
web interface.
8.5 Conclusion
provement. For example, the clinical trials with 30 elderly subjects revealed some
weaknesses when dealing with different houseworking activities. Therefore, a more
advanced post-processing step or the introduction of high-level activity recognition
should be investigated in future work.
Conclusion
9
The main goal defined for this thesis was the development of a mobile, personalized
physical activity monitoring system applicable for everyday life scenarios. The goal
was motivated by the fact that regular physical activity is essential to maintain or
even improve an individual’s health. It is important to monitor how much physical
activity individuals do during their daily routine, to be able to tell how far they meet
professional recommendations. Such recommendations or general guidelines exist
for all the different age groups to perform aerobic, muscle-strengthening, flexibility
or balance exercises. From these recommendations, this thesis concentrated on moni-
toring aerobic physical activity. Two main objectives were defined in this context. On
the one hand, the goal was to estimate the intensity of performed activities: To dis-
tinguish activities of light, moderate or vigorous effort. On the other hand, the goal
was also to recognize basic aerobic activities (such as walk, run or cycle) and basic
postures (lie, sit and stand). This way, the developed system can give a more detailed
description of an individual’s daily routine.
9.1 Results
The hardware already exist to create the desired physical activity monitoring system
in an unobtrusive way, e.g. by using current smart phone technology and wearable
sensors. Therefore, the focus of this thesis was on the development of methods for
physical activity recognition and intensity estimation, which are applicable for the
envisioned mobile system. Emphasis was placed thereby on identifying key chal-
lenges in this research field and on addressing them with the introduction of novel
methods and algorithms. Moreover, it should be noted that a high value was put
on the evaluation of the proposed methods: Thorough experiments are presented in
the respective chapters of this thesis to justify the introduced data processing and
classification methods.
147
148 9 Conclusion
• Creation of two new datasets for physical activity monitoring, including a wide
range of physical activities. Moreover, both datasets have been made publicly
available and can already show a certain impact in the research community.
• Investigation of the means to create robust activity monitoring systems for ev-
eryday life, which includes the concept and modeling of other activities and
highlighting the importance of subject independent validation techniques.
The listed contributions are both of theoretical (cf. e.g. the novel algorithms and
developed models) and of practical value (cf. e.g. the proposed evaluation techniques).
Some of the contributions are directly benefiting the research community (e.g. the
created and benchmarked datasets). Moreover, this thesis also deals with the imple-
mentation of the presented methods, in order to realize the envisioned mobile system
for physical activity monitoring.
Extensive Data Collection. Although this thesis introduced two large datasets, these
datasets still show some shortcomings. Two of the important limitations are the simi-
lar set of users (considering e.g. age or physical fitness) and the only semi-naturalistic
data collection protocol. However, with the mobile system presented in Section 8.3.1
a robust and unobtrusive tool is provided to perform further data recordings. A new
dataset of physical activities should include subjects from all the different age groups:
children, adolescents, young and middle-aged adults and elderly. Moreover, further
user groups should be included as well, e.g. people with overweight or with disabili-
ties. Such a dataset would provide an excellent basis to further evaluate and improve
the personalization approaches proposed in Chapter 7. Another important aspect
of a new dataset should be to at least partially record it under realistic conditions,
thus during the subjects’ regular daily routine. This would also enable to test e.g. the
user acceptance of the mobile system, and would provide data for high-level activity
recognition.
Semi-supervised Learning. The methods presented in this thesis all rely on only
annotated data. However, as discussed in Chapter 3, obtaining ground truth for
recorded sensory data is not straightforward. With the available technology it is
easy to generate large datasets nowadays, but labeling still requires expensive human
effort. Therefore, semi-supervised learning receives increasing attention in the ma-
chine learning community [204]. These methods can combine a small amount of la-
beled data with large amounts of unlabeled data. Semi-supervised learning methods
have been applied in the physical activity monitoring research field recently, deliver-
ing promising results [5, 37, 76]. A special case of semi-supervised learning is active
learning, where the learning algorithm chooses the most informative data samples to
be annotated [159]. An application of this approach for human activity recognition
was shown by Alemdar et al. [3]. Therefore, and considering the above described
plan for a new extensive data collection, semi-supervised learning methods deserve
further attention.
Extension of the Modular Activity Monitoring System. Two sensors have been in-
vestigated in this thesis: accelerometer and heart rate monitor. Building on these two
sensors, a modular mobile activity monitoring system was presented in Chapter 8.
150 9 Conclusion
151
152 A Abbreviations and Acronyms
Abbreviation Meaning
icdm International Conference on Data Mining
imu Inertial measurement unit
iswc International Symposium on Wearable Computers
knn k-Nearest Neighbor
loao Leave-one-activity-out
looao Leave-one-other-activity-out
loso Leave-one-subject-out
mems Micro-electro-mechanical system
met Metabolic equivalent
mfcc Mel-frequency cepstral coefficient
mhr Maximum heart rate
mv Majority Voting
pamap Physical activity monitoring for aging people
pca Principal component analysis
psd Power spectral density
rfid Radio-frequency identification
rssi Received signal strength indicator
rwma Randomized weighted majority algorithm
samme Stagewise additive modeling using a multi-class exponen-
tial loss function
svm Support Vector Machine
umpc Ultra-mobile personal computer
weka Waikato Environment for Knowledge Analysis
who World Health Organization
wma Weighted majority algorithm
wmv Weighted majority voting
Datasets: Supplementary Material
B
This appendix presents supplementary material related to the PAMAP and PAMAP2
datasets, both described in Chapter 3.
Table B.1: Data format of the published PAMAP dataset. The data files contain
45 columns, described on the left side. The right side of the table specifies the
content of an IMU sensor (hand, chest or foot) data.
Column Data content Column Data content
1 timestamp (s) 1 temperature (◦ C)
2 activity ID 2-4 3D-accelerometer (ms−2 )
3 heart rate (bpm) 5-7 3D-gyroscope (◦ /s)
4-17 IMU hand 8-10 3D-magnetometer (µT)
18-31 IMU chest 11-14 orientation (turned off)
32-45 IMU foot
Table B.2: Data format of the published PAMAP2 dataset. The data files contain
54 columns, described on the left side. The right side of the table specifies the
content of an IMU sensor (hand, chest or ankle) data.
Column Data content Column Data content
1 timestamp (s) 1 temperature (◦ C)
2 activity ID 2-4 3D-accelerometer (ms−2 ), scale: ±16g
3 heart rate (bpm) 5-7 3D-accelerometer (ms−2 ), scale: ±6g
4-20 IMU hand 8-10 3D-gyroscope (◦ /s)
21-37 IMU chest 11-13 3D-magnetometer (µT)
38-54 IMU ankle 14-17 orientation (turned off)
153
B Datasets: Supplementary Material
[2] Fahd Albinali, Stephen S. Intille, William L. Haskell, and Mary Rosenberger.
Using wearable activity type detection to improve physical activity energy ex-
penditure estimation. In Proceedings of 12th International Conference on
Ubiquitous Computing (UbiComp), pages 311–320, Copenhagen, Denmark,
September 2010.
[3] Hande Alemdar, Tim L. M. van Kasteren, and Cem Ersoy. Using active learning
to allow activity recognition on a large scale. In Proceedings of 2nd Interna-
tional Conference on Ambient Intelligence (AmI), pages 105–114, Amsterdam,
Netherlands, November 2011.
[4] Leslie Alford. What men should know about the impact of physical activity
on their health. International Journal of Clinical Practice, 64(13):1731–1734,
December 2010.
[5] Aziah Ali, Rachel C. King, and Guang-Zhong Yang. Semi-supervised segmenta-
tion for activity recognition with Multiple Eigenspaces. In Proceedings of 5th
International Workshop on Wearable and Implantable Body Sensor Networks
(BSN), pages 314–317, Hong Kong, China, June 2008.
[6] Fevzi Alimoglu and Ethem Alpaydin. Methods of combining multiple classi-
fiers based on different representations for pen-based handwritten digit recog-
nition. In Proceedings of 5th Turkish Artificial Intelligence and Artificial Neu-
ral Networks Symposium (TAINN), Istanbul, Turkey, 1996.
[7] Bashar Altakouri, Gerd Kortuem, Agnes Grünerbl, Kai Kunze, and Paul Lukow-
icz. The benefit of activity recognition for mobile phone based nursing doc-
umentation: a Wizard-of-Oz study. In Proceedings of IEEE 14th Interna-
tional Symposium on Wearable Computers (ISWC), Seoul, South Korea, Oc-
tober 2010.
157
158 Bibliography
[8] Kerem Altun, Billur Barshan, and Orkun Tunçel. Comparative study on classi-
fying human activities with miniature inertial and magnetic sensors. Pattern
Recognition, 43(10):3605–3620, October 2010.
[9] Ran Avnimelech and Nathan Intrator. Boosting regression estimators. Neural
Computation, 11(2):499–520, February 1999.
[10] Oresti Baños, Miguel Damas, Héctor Pomares, Ignacio Rojas, Máté A. Tóth, and
Oliver Amft. A benchmark dataset to evaluate sensor displacement in activ-
ity recognition. In Proceedings of 14th International Conference on Ubiqui-
tous Computing (UbiComp), pages 1026–1035, Pittsburgh, PA, USA, Septem-
ber 2012.
[11] Oresti Baños, Miguel Damas, Héctor Pomares, Fernando Rojas, Blanca Delgado-
Marquez, and Olga Valenzuela. Human activity recognition based on a sen-
sor weighting hierarchical classifier. Soft Computing, 17(2):333–343, February
2013.
[12] Kevin Bache and Moshe Lichman. UCI Machine Learning Repository. URL
http://archive.ics.uci.edu/ml.
[13] Marc Bächlin, Martin Kusserow, Hanspeter Gubelmann, and Gerhard Tröster.
Ski jump analysis of an Olympic champion with wearable acceleration sensors.
In Proceedings of IEEE 14th International Symposium on Wearable Computers
(ISWC), Seoul, South Korea, October 2010.
[14] Gernot Bahle, Kai Kunze, Koichi Kise, and Paul Lukowicz. I see you: How
to improve wearable activity recognition by leveraging information from en-
vironmental cameras. In Proceedings of 11th IEEE International Conference
on Pervasive Computing and Communications (PerCom), pages 409–412, San
Diego, CA, USA, March 2013.
[15] Ling Bao and Stephen S. Intille. Activity recognition from user-annotated ac-
celeration data. In Proceedings of 2nd International Conference on Pervasive
Computing (PERVASIVE), pages 1–17, Linz/Vienna, Austria, April 2004.
[16] Martin Berchtold, Matthias Budde, Dawud Gordon, Hedda R. Schmidtke, and
Michael Beigl. ActiServ: Activity recognition service for mobile phones. In
Proceedings of IEEE 14th International Symposium on Wearable Computers
(ISWC), Seoul, South Korea, October 2010.
[17] Gerald Bieber and Christian Peter. Using physical activity for user behavior
analysis. In Proceedings of 1st International Conference on Pervasive Technolo-
gies Related to Assistive Environments (PETRA), Athens, Greece, July 2008.
[18] Gerald Bieber, Jörg Voskamp, and Bodo Urban. Activity recognition for ev-
eryday life on mobile phones. In Proceedings of 5th International Conference
on Universal Access in Human-Computer Interaction (UAHCI), pages 289–296,
San Diego, CA, USA, July 2009.
Bibliography 159
[19] Ulf Blanke and Bernt Schiele. Remember and transfer what you have learned
- recognizing composite activities based on activity spotting. In Proceedings
of IEEE 14th International Symposium on Wearable Computers (ISWC), Seoul,
South Korea, October 2010.
[20] Avrim Blum. On-line algorithms in machine learning. In Proceedings of Work-
shop on On-Line Algorithms, Dagstuhl, pages 306–325, Dagstuhl, Germany,
June 1996.
[21] BM-innovations. BM innovations GmbH development and product website,
2013-09-16. URL http://www.bm-innovations.com.
[22] Alberto G. Bonomi, Guy Plasqui, Annelies Goris, and Klaas R. Westerterp. Im-
proving assessment of daily energy expenditure by identifying types of physi-
cal activity with a single accelerometer. Journal of Applied Physiology, 107(3):
655–661, September 2009.
[23] Marko Borazio and Kristof van Laerhoven. Combining wearable and environ-
mental sensing into an unobtrusive tool for long-term sleep studies. In Pro-
ceedings of 2nd ACM SIGHIT International Health Informatics Symposium
(IHI), pages 71–80, Miami, FL, USA, January 2012.
[24] Alan K. Bourke and Gerald M. Lyons. A threshold-based fall-detection algo-
rithm using a bi-axial gyroscope sensor. Medical Engineering & Physics, 30(1):
84–90, January 2008.
[25] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, August
1996.
[26] Leah Buechley. A construction kit for electronic textiles. In Proceedings of IEEE
10th International Symposium on Wearable Computers (ISWC), pages 83–90,
Montreux, Switzerland, October 2006.
[27] Andrew Campbell and Tanzeem Choudhury. From smart to cognitive phones.
IEEE Pervasive Computing, 11(3):7–11, March 2012.
[28] Hong Cao, Minh Nhut Nguyen, Clifton Phua, Shonali Krishnaswamy, and Xiao-
Li Li. An integrated framework for human activity classication. In Proceedings
of 14th International Conference on Ubiquitous Computing (UbiComp), pages
331–340, Pittsburgh, PA, USA, September 2012.
[29] Kuang-I Chang, Yen-Hsien Lee, Yu-Jen Su, Hong-Dun Lin, and Bor-Nian
Chuang. Portable driver drowsiness prediction device and method. In Pro-
ceedings of 33rd Annual International IEEE EMBS Conference, pages 4390–
4393, Boston, MA, USA, August-September 2011.
[30] Chao Chen, Daqing Zhang, Lin Sun, Mossaab Hariz, and Yang Yuan. Does lo-
cation help daily activity recognition? In Proceedings of 10th International
Conference on Smart Homes and Health Telematics (ICOST), pages 83–90, Ar-
timino, Italy, June 2012.
160 Bibliography
[31] Heng-Tze Cheng, Martin Griss, Paul Davis, Jianguo Li, and Di You. Toward
zero-shot learning for human activity recognition using semantic attribute se-
quence model. In Proceedings of 2013 ACM International Joint Conference
on Pervasive and Ubiquitous Computing (UbiComp), pages 355–358, Zurich,
Switzerland, September 2013.
[32] David A. Clifton, Lei Clifton, Samuel Hugueny, David Wong, and Lionel
Tarassenko. An extreme function theory for novelty detection. IEEE Journal
of Selected Topics in Signal Processing, 7(1):28–37, February 2013.
[33] Sunny Consolvo, David W. McDonald, Tammy Toscos, Mike Y. Chen, Jon
Froehlich, Beverly Harrison, Predrag Klasnja, Anthony LaMarca, Louis
LeGrand, Ryan Libby, Ian Smith, and James A. Landay. Activity sensing in
the wild: a field trial of ubifit garden. In Proceedings of 26th SIGCHI Con-
ference on Human Factors in Computing Systems, pages 1797–1806, Florence,
Italy, April 2008.
[34] James W. Cooley and John W. Tukey. An algorithm for the machine calculation
of complex Fourier series. Mathematics of Computation, 19(90):297–301, 1965.
[35] Scott E. Crouter, Kurt G. Clowers, and David R. Bassett. A novel method for
using accelerometer data to predict energy expenditure. Journal of Applied
Physiology, 100(4):1324–1331, April 2006.
[36] Scott E. Crouter, James R. Churilla, and David R. Bassett. Accuracy of the
Actiheart for the assessment of energy expenditure in adults. European Journal
of Clinical Nutrition, 62(6):704–711, June 2008.
[37] Boz̆idara Cvetković, Mitja Lus̆trek, Bos̆tjan Kaluz̆a, and Matjaz̆ Gams. Semi-
supervised learning for adaptation of human activity recognition classifier to
the user. In Proceedings of 22nd International Joint Conference on Artificial
Intelligence (IJCAI), Barcelona, Spain, July 2011.
[38] Anthony Dalton and Gerald OLaighin. Comparing supervised learning tech-
niques on the task of physical activity recognition. IEEE Transactions on Infor-
mation Technology in Biomedicine, 2012.
[39] Jakob Doppler, Gerald Holl, Alois Ferscha, Marquart Franz, Cornel Klein, Mar-
cos Dos Santos Rocha, and Andreas Zeidler. Variability in foot-worn sensor
placement for activity recognition. In Proceedings of IEEE 13th International
Symposium on Wearable Computers (ISWC), pages 143–144, Linz, Austria,
September 2009.
[40] Thi V. Duong, Hung H. Bui, Dinh Q. Phung, and Svetha Venkatesh. Activity
recognition and abnormality detection with the switching hidden semi-Markov
model. In Proceedings of IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR), pages 838–845, San Diego, CA, USA,
June 2005.
Bibliography 161
[41] Günther Eibl and Karl P. Pfeiffer. How to make AdaBoost.M1 work for weak
base classifiers by changing only one line of the code. In Proceedings of 13th
European Conference on Machine Learning (ECML), pages 72–83, Helsinki,
Finland, August 2002.
[42] Miikka Ermes, Juha Pärkkä, and Luc Cluitmans. Advancing from offline to on-
line activity recognition with wearable sensors. In Proceedings of 30th Annual
International IEEE EMBS Conference, pages 4451–4454, Vancouver, Canada,
August 2008.
[43] Miikka Ermes, Juha Pärkkä, Jani Mäntyjärvi, and Ilkka Korhonen. Detection of
daily activities and sports with wearable sensors in controlled and uncontrolled
conditions. IEEE Transactions on Information Technology in Biomedicine, 12
(1):20–26, January 2008.
[45] Jean Baptiste Faddoul, Boris Chidlovskii, Rémi Gilleron, and Fabien Torre.
Learning multiple tasks with boosted decision trees. In Proceedings of 2012
European Conference on Machine Learning and Principles and Practice of
Knowledge Discovery in Databases (ECML-PKDD), pages 681–696, Bristol, UK,
September 2012.
[46] Chloe Fan, Jodi Forlizzi, and Anind K. Dey. A spark of activity: exploring
informative art as visualization for physical activity. In Proceedings of 14th
International Conference on Ubiquitous Computing (UbiComp), pages 81–84,
Pittsburgh, PA, USA, September 2012.
[47] Jesus Favela, Monica Tentori, Luis A. Castro, Victor M. Gonzalez, Elisa B.
Moran, and Ana I. Martínez-García. Activity recognition for context-aware
hospital applications: issues and opportunities for the deployment of perva-
sive networks. Mobile Networks and Applications, 12(2-3):155–171, March
2007.
[48] Davide Figo, Pedro C. Diniz, Diogo R. Ferreira, and João M.P. Cardoso. Prepro-
cessing techniques for context recognition from accelerometer data. Personal
and Ubiquitous Computing, 14(7):645–662, October 2010.
[52] Samuel M. Fox, John P. Naughton, and William L. Haskell. Physical activity
and the prevention of coronary heart disease. Annals of Clinical Research, 3
(6):404–432, December 1971.
[53] Korbinian Frank, Maria Josefa Vera Nadales, Patrick Robertson, and Tom
Pfeifer. Bayesian recognition of motion related activities with inertial sensors.
In Proceedings of 12th International Conference on Ubiquitous Computing
(UbiComp), pages 445–446, Copenhagen, Denmark, September 2010.
[56] Yoav Freund and Robert E. Schapire. Response to Mease and Wyner, Evidence
contrary to the statistical view of boosting, JMLR 9:131-156, 2008. Journal of
Machine Learning Research, 9:171–174, June 2008.
[57] Peter W. Frey and David J. Slate. Letter recognition using Holland-style adap-
tive classifiers. Machine Learning, 6(2):161–182, March 1991.
[58] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic re-
gression: a statistical view of boosting. The Annals of Statistics, 28(2):337–407,
2000.
[59] Yuichi Fujiki, Konstantinos Kazakos, Colin Puri, Pradeep Buddharaju, Ioannis
Pavlidis, and James Levine. NEAT-o-Games: blending physical activity and fun
in the daily routine. Computers in Entertainment, 6(2), July 2008.
[61] Dawud Gordon, Hedda Rahel Schmidtke, Michael Beigl, and Georg Von Zen-
gen. A novel micro-vibration sensor for activity recognition: potential and lim-
itations. In Proceedings of IEEE 14th International Symposium on Wearable
Computers (ISWC), Seoul, South Korea, October 2010.
[62] Dawud Gordon, Jürgen Czerny, Takashi Miyaki, and Michael Beigl. Energy-
efficient activity recognition using prediction. In Proceedings of IEEE 16th
International Symposium on Wearable Computers (ISWC), pages 29–36, New-
castle, UK, June 2012.
[63] Norbert Győrbíró, Ákos Fábián, and Gergely Hományi. An activity recognition
system for mobile phones. Mobile Networks and Applications, 14(1):82–91,
February 2009.
Bibliography 163
[64] Eija Haapalainen, SeungJun Kim, Jodi F. Forlizzi, and Anind K. Dey. Psycho-
physiological measures for assessing cognitive load. In Proceedings of 12th
International Conference on Ubiquitous Computing (UbiComp), pages 301–
310, Copenhagen, Denmark, September 2010.
[65] Mark A. Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reute-
mann, and Ian H. Witten. The WEKA data mining software: an update.
SIGKDD Explorations Newsletter, 11(1):10–18, November 2009.
[66] William L. Haskell, I-Min Lee, Russell R. Pate, Kenneth E. Powell, Steven N.
Blair, Barry A. Franklin, Caroline A. Macera, Gregory W. Heath, Paul D. Thomp-
son, and Adrian Bauman. Physical activity and public health: Updated recom-
mendation for adults from the American College of Sports Medicine and the
American Heart Association. Medicine and Science in Sports and Exercise, 39
(8):1423–34, August 2007.
[67] Yi He, Ye Li, and Shu-Di Bao. Fall detection by built-in tri-accelerometer
of smartphone. In Proceedings of IEEE-EMBS International Conference on
Biomedical and Health Informatics (BHI), pages 184–187, Hong Kong, China,
January 2012.
[69] Gerold Hölzl, Marc Kurz, and Alois Ferscha. Goal oriented opportunistic recog-
nition of high-level composed activities using dynamically configured hidden
Markov models. In Proceedings of 3rd International Conference on Ambient
Systems, Networks and Technologies (ANT), pages 308–315, Niagara Falls, ON,
Canada, August 2012.
[70] Chih-wei Hsu, Chih-chung Chang, and Chich-jen Lin. A practical guide to
support vector classification. Bioinformatics, 1(1):1–16, 2010.
[71] Bing Hu, Yanping Chen, and Eamonn J. Keogh. Time series classification un-
der more realistic assumptions. In SIAM Conference on Data Mining (SDM),
Austin, TX, USA, May 2013.
[72] Jian Huang, Seyda Ertekin, Yang Song, Hongyuan Zha, and C. Lee Giles. Effi-
cient multiclass boosting classification with active learning. In SIAM Interna-
tional Conference on Data Mining (SDM), Minneapolis, MN, USA, April 2007.
[73] Tzu-Kuo Huang and Jeff Schneider. Spectral learning of Hidden Markov Mod-
els from dynamic and static data. In Proceedings of 30th International Confer-
ence on Machine Learning (ICML), Atlanta, GA, USA, June 2013.
[74] Tâm Huynh. Human activity recognition with wearable sensors. PhD thesis,
TU Darmstadt, September 2008.
164 Bibliography
[75] Tâm Huynh and Bernt Schiele. Analyzing features for activity recognition. In
Proceedings of Joint Conference on Smart Objects and Ambient Intelligence
(sOc-EuSAI), pages 159–163, Grenoble, France, October 2005.
[76] Tâm Huynh and Bernt Schiele. Towards less supervision in activity recognition
from wearable sensors. In Proceedings of IEEE 10th International Symposium
on Wearable Computers (ISWC), pages 3–10, Montreux, Switzerland, October
2006.
[77] Tâm Huynh and Bernt Schiele. Unsupervised discovery of structure in activity
data using multiple eigenspaces. In Proceedings of 2nd International Work-
shop on Location- and Context-Awareness (LoCA), Dublin, Ireland, May 2006.
[78] Tâm Huynh, Ulf Blanke, and Bernt Schiele. Scalable recognition of daily ac-
tivities with wearable sensors. In Proceedings of 3rd International Workshop
on Location- and Context-Awareness (LoCA), pages 50–67, Oberpfaffenhofen,
Germany, September 2007.
[79] Tâm Huynh, Mario Fritz, and Bernt Schiele. Discovery of activity patterns
using topic models. In Proceedings of 10th International Conference on Ubiq-
uitous Computing (UbiComp), pages 10–19, Seoul, South Korea, September
2008.
[80] Stephen S. Intille, Kent Larson, Emmanuel Munguia Tapia, Jennifer S. Beaudin,
Pallavi Kaushik, Jason Nawyn, and Randy Rockinson. Using a live-in labo-
ratory for ubiquitous computing research. In Proceedings of 4th International
Conference on Pervasive Computing (PERVASIVE), pages 349–365, Dublin, Ire-
land, May 2006.
[81] Xiaobo Jin, Xinwen Hou, and Cheng-Lin Liu. Multi-class AdaBoost with hy-
pothesis margin. In Proceedings of 20th International Conference on Pattern
Recognition (ICPR), pages 65–68, Washington, DC, USA, August 2010.
[82] Darcy L. Johannsen, Miguel Andres Calabro, Jeanne Stewart, Warren Franke,
Jennifer C. Rood, and Gregory J. Welk. Accuracy of armband monitors for
measuring daily energy expenditure in healthy adults. Medicine and Science
in Sports and Exercise, 42(11):2134–2140, November 2010.
[84] Sidney Katz, Amasa B. Ford, Roland W. Moskowitz, Beverly A. Jackson, and
Marjorie W. Jae. Studies of illness in the aged. The index of ADL: a standardized
measure of biological and psychosocial function. JAMA, 185:914–919, Septem-
ber 1963.
Bibliography 165
[86] Jennifer R. Kwapisz, Gary M. Weiss, and Samuel A. Moore. Activity recognition
using cell phone accelerometers. In Proceedings of 4th International Workshop
on Knowledge Discovery from Sensor Data (SensorKDD), pages 74–82, Wash-
ington, DC, USA, July 2010.
[87] Cassim Ladha, Nils Y. Hammerla, Patrick Olivier, and Thomas Plötz. ClimbAX:
skill assessment for climbing enthusiasts. In Proceedings of 2013 ACM Inter-
national Joint Conference on Pervasive and Ubiquitous Computing (UbiComp),
pages 235–244, Zurich, Switzerland, September 2013.
[88] Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles, Tanzeem Choud-
hury, and Andrew T. Campbell. A survey of mobile phone sensing. IEEE Com-
munications Magazine, 48(9):140–150, September 2010.
[89] Óscar D. Lara, Alfredo J. Pérez, Miguel A. Labrador, and José D. Posada. Cen-
tinela: A human activity recognition system based on acceleration and vital
sign data. Pervasive and Mobile Computing, 8(5):717–729, October 2012.
[90] Sian Lun Lau, Immanuel König, Klaus David, Baback Parandian, Christine
Carius-Düssel, and Martin Schultz. Supporting patient monitoring using activ-
ity recognition with a smartphone. In Proceedings of 7th International Sympo-
sium on Wireless Communication Systems (ISWCS), pages 810–814, York, UK,
September 2010.
[91] M. Powell Lawton and Elaine M. Brody. Assessment of older people: selfmain-
taining and instrumental activities of daily living. Gerontologist, 9(3):179–186,
1969.
[93] Mi-hee Lee, Jungchae Kim, Kwangsoo Kim, Inho Lee, Sun Ha Jee, and Sun Kook
Yoo. Physical activity recognition using a single tri-axis accelerometer. In Pro-
ceedings of World Congress on Engineering and Computer Science (WCECS),
San Francisco, CA, USA, October 2009.
[94] Myong-Woo Lee, Adil Mehmood Khan, Ji-Hwan Kim, Young-Sun Cho, and Tae-
Seong Kim. A single tri-axial accelerometer-based real-time personal life log
system capable of activity classification and exercise information generation.
In Proceedings of 32nd Annual International IEEE EMBS Conference, pages
1390–1393, Buenos Aires, Argentina, August-September 2010.
[95] Qiang Li, John A. Stankovic, Mark A. Hanson, Adam T. Barth, John Lach, and
Gang Zhou. Accurate, fast fall detection using gyroscopes and accelerometer-
derived posture information. In Proceedings of 6th International Workshop
on Wearable and Implantable Body Sensor Networks (BSN), pages 138–143,
Berkeley, CA, USA, June 2009.
166 Bibliography
[96] Lin Liao, Dieter Fox, and Henry Kautz. Extracting places and activities from
GPS traces using hierarchical conditional random fields. International Journal
of Robotics Research, 26(1):119–134, January 2007.
[97] James J. Lin, Lena Mamykina, Silvia Lindtner, Gregory Delajoux, and Henry B.
Strub. Fish’n’Steps: encouraging physical activity with an interactive computer
game. In Proceedings of 8th International Conference on Ubiquitous Comput-
ing (UbiComp), pages 261–278, Orange County, CA, USA, September 2006.
[98] Shaopeng Liu, Robert X. Gao, Dinesh John, John Staudenmayer, and Patty S.
Freedson. SVM-based multi-sensor fusion for free-living physical activity as-
sessment. In Proceedings of 33rd Annual International IEEE EMBS Conference,
pages 3188–3191, Boston, MA, USA, August-September 2011.
[99] Jeffrey W. Lockhart, Tony Pulickal, and Gary M. Weiss. Applications of mobile
activity recognition. In Proceedings of 14th International Conference on Ubiq-
uitous Computing (UbiComp), pages 1054–1058, Pittsburgh, PA, USA, Septem-
ber 2012.
[100] Xi Long, Bin Yin, and Ronald M. Aarts. Single-accelerometer based daily phys-
ical activity classification. In Proceedings of 31st Annual International IEEE
EMBS Conference, pages 6107–6110, Minneapolis, MN, USA, September 2009.
[101] Paul Lukowicz, Jamie A. Ward, Holger Junker, Mathias Stäger, Gerhard Tröster,
Amin Atrash, and Thad E. Starner. Recognizing workshop activity using body
worn microphones and accelerometers. In Proceedings of 2nd International
Conference on Pervasive Computing (PERVASIVE), pages 18–32, Linz/Vienna,
Austria, April 2004.
[102] Paul Lukowicz, Andreas Timm-Giel, Michael Lawo, and Otthein Herzog.
WearIT@work: toward real-world industrial wearable computing. IEEE Per-
vasive Computing, 6(4):8–13, October 2007.
[103] Paul Lukowicz, Gerald Pirkl, David Bannach, Florian Wagner, Alberto Cala-
troni, Kilian Förster, Thomas Holleczek, Mirco Rossi, Danial Roggen, Gerhard
Tröster, Jakob Doppler, Clemens Holzmann, Andreas Riener, Alois Ferscha,
and Ricardo Chavarriaga. Recording a complex, multi modal activity data set
for context recognition. In Proceedings of 23rd International Conference on Ar-
chitecture of Computing Systems (ARCS), 1st Workshop on Context-Systems
Design, Evaluation and Optimisation (CosDEO), Hannover, Germany, Febru-
ary 2010.
[104] Mitja Lus̆trek and Bos̆tjan Kaluz̆a. Fall detection and activity recognition with
machine learning. Informatica, 33(2):205–212, 2009.
[105] Takuya Maekawa and Shinji Watanabe. Unsupervised activity recognition with
user’s physical characteristics data. In Proceedings of IEEE 15th International
Symposium on Wearable Computers (ISWC), pages 89–96, San Francisco, CA,
USA, June 2011.
Bibliography 167
[106] Takuya Maekawa, Yutaka Yanagisawa, Yasue Kishino, Katsuhiko Ishiguro, Koji
Kamei, Yasushi Sakurai, and Takeshi Okadome. Object-based activity recogni-
tion with heterogeneous sensors on wrist. In Proceedings of 8th International
Conference on Pervasive Computing (PERVASIVE), pages 246–264, Helsinki,
Finland, May 2010.
[107] Takuya Maekawa, Yasue Kishino, Yutaka Yanagisawa, and Yasushi Sakurai.
Mimic sensors: battery-shaped sensor node for detecting electrical events of
handheld devices. In Proceedings of 10th International Conference on Perva-
sive Computing (PERVASIVE), pages 20–38, Newcastle, UK, June 2012.
[108] Dominic Maguire and Richard Frisby. Comparison of feature classification al-
gorithm for activity recognition based on accelerometer and heart rate data. In
Proceedings of 9th IT & T Conference, Dublin, Ireland, October 2009.
[109] Jussi Mattila, Hang Ding, and Elina Mattila. Mobile tools for home-based car-
diac rehabilitation based on heart rate and movement activity analysis. In
Proceedings of 31st Annual International IEEE EMBS Conference, pages 6448–
6452, Minneapolis, MN, USA, September 2009.
[110] David Mease and Abraham Wyner. Evidence contrary to the statistical view of
boosting. Journal of Machine Learning Research, 9:131–156, June 2008.
[111] David Minnen, Tracy Westeyn, Daniel Ashbrook, Peter Presti, and Thad E.
Starner. Recognizing soldier activities in the field. In Proceedings of 4th Inter-
national Workshop on Wearable and Implantable Body Sensor Networks (BSN),
pages 236–241, Aachen, Germany, March 2007.
[112] Lingfei Mo, Shaopeng Liu, Robert X. Gao, Dinesh John, John Staudenmayer,
and Patty S. Freedson. ZigBee-based wireless multi-sensor system for physical
activity assessment. In Proceedings of 33rd Annual International IEEE EMBS
Conference, pages 846–849, Boston, MA, USA, August-September 2011.
[113] Miriam E. Nelson, W. Jack Rejeski, Steven N. Blair, Pamela W. Duncan, James O.
Judge, Abby C. King, Carol A. Macera, and Carmen Castaneda-Sceppa. Physi-
cal activity and public health in older adults: recommendation from the Amer-
ican College of Sports Medicine and the American Heart Association. Circula-
tion, 116(9):1094–1105, 2007.
[114] ROVING Networks. Roving Networks development and product website, 2013-
09-16. URL http://www.rovingnetworks.com.
[115] Georg Ogris, Thomas Stiefmeier, Paul Lukowicz, and Gerhard Tröster. Using a
complex multi-modal on-body sensor system for activity spotting. In Proceed-
ings of IEEE 12th International Symposium on Wearable Computers (ISWC),
pages 55–62, Pittsburgh, PA, USA, September-October 2008.
[117] Juha Pärkkä, Miikka Ermes, Panu Korpipää, Jani Mäntyjärvi, Johannes Peltola,
and Ilkka Korhonen. Activity classification using realistic data from wearable
sensors. IEEE Transactions on Information Technology in Biomedicine, 10(1):
119–128, January 2006.
[118] Juha Pärkkä, Miikka Ermes, K. Antila, Mark van Gils, A. Mänttäri, and H. Niem-
inen. Estimating intensity of physical activity: a comparison of wearable ac-
celerometer and gyro sensors and 3 sensor locations. In Proceedings of 29th
Annual International IEEE EMBS Conference, pages 1511–1514, Lyon, France,
August 2007.
[119] Juha Pärkkä, Juho Merilahti, Elina M. Mattila, Esko Malm, Kari Antila, Martti T.
Tuomisto, Ari Viljam Saarinen, Mark van Gils, and Ilkka Korhonen. Relation-
ship of psychological and physiological variables in long-term self-monitored
data during work ability rehabilitation program. IEEE Transactions on Infor-
mation Technology in Biomedicine, 13(2):141–151, March 2009.
[120] Juha Pärkkä, Luc Cluitmans, and Miikka Ermes. Personalization algorithm for
real-time activity recognition using PDA, wireless motion bands, and binary
decision tree. IEEE Transactions on Information Technology in Biomedicine,
14(5):1211–1215, September 2010.
[121] Kurt Partridge and Bo Begole. Activity-based advertising: techniques and chal-
lenges. In Proceedings of 1st Workshop on Pervasive Advertising, Nara, Japan,
May 2009.
[122] Shyamal Patel, Chiara Mancinelli, Jennifer Healey, Marilyn Moy, and Paolo
Bonato. Using wearable sensors to monitor physical activities of patients with
COPD: a comparison of classifier performance. In Proceedings of 6th Interna-
tional Workshop on Wearable and Implantable Body Sensor Networks (BSN),
pages 234–239, Berkeley, CA, USA, June 2009.
[123] Matthai Philipose, Kenneth P. Fishkin, Mike Perkowitz, Donald J. Patterson, Di-
eter Fox, Henry Kautz, and Dirk Hahnel. Inferring activities from interactions
with objects. IEEE Pervasive Computing, 3(4):50–57, October 2004.
[124] Susanna Pirttikangas, Kaori Fujinami, and Tatsuo Nakajima. Feature selection
and activity recognition from wearable sensors. In Proceedings of 3rd Interna-
tional Symposium on Ubiquitous Computing Systems (UCS), pages 516–527.
Seoul, South Korea, October 2006.
[125] Ronald Poppe. A survey on vision-based human action recognition. Image and
Vision Computing, 28(6):976–990, June 2010.
[126] J. Ross Quinlan. C4.5: programs for machine learning. San Mateo: Morgan
Kaufmann, 1993.
[127] J. Ross Quinlan. Bagging, boosting and C4.5. In Proceedings of 13th Na-
tional Conference on Artificial Intelligence (AAAI), pages 725–730, Portland,
OR, USA, August 1996.
Bibliography 169
[128] J. Ross Quinlan, Paul J. Compton, K. A. Horn, and Leslie Lazarus. Inductive
knowledge acquisition: a case study. In Proceedings of 2nd Australian Con-
ference on Applications of Expert Systems, pages 137–156, Sydney, Australia,
May 1986.
[130] Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, and Scott Evans.
MDL-based time series clustering. Knowledge and Information Systems, 33(2):
371–399, November 2012.
[131] Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Michael L. Littman.
Activity recognition from accelerometer data. In Proceedings of 17th Confer-
ence on Innovative Applications of Artificial Intelligence (IAAI), pages 1541–
1546, Pittsburgh, PA, USA, July 2005.
[132] Attila Reiss. PAMAP2 Physical Activity Monitoring Data Set, 2013-
09-16. URL http://archive.ics.uci.edu/ml/datasets/PAMAP2+
Physical+Activity+Monitoring.
[133] Attila Reiss and Didier Stricker. Towards global aerobic activity monitoring. In
Proceedings of 4th International Conference on Pervasive Technologies Related
to Assistive Environments (PETRA), Crete, Greece, May 2011.
[134] Attila Reiss and Didier Stricker. Introducing a modular activity monitoring
system. In Proceedings of 33rd Annual International IEEE EMBS Conference,
pages 5621–5624, Boston, MA, USA, August-September 2011.
[135] Attila Reiss and Didier Stricker. Creating and benchmarking a new dataset for
physical activity monitoring. In Proceedings of 5th Workshop on Affect and
Behaviour Related Assistance (ABRA), Crete, Greece, June 2012.
[136] Attila Reiss and Didier Stricker. Introducing a new benchmarked dataset for
activity monitoring. In Proceedings of IEEE 16th International Symposium on
Wearable Computers (ISWC), pages 108–109, Newcastle, UK, June 2012.
[137] Attila Reiss and Didier Stricker. Aerobic activity monitoring: towards a long-
term approach. International Journal of Universal Access in the Information
Society (UAIS), March 2013.
[138] Attila Reiss and Didier Stricker. Personalized mobile physical activity recog-
nition. In Proceedings of IEEE 17th International Symposium on Wearable
Computers (ISWC), Zurich, Switzerland, September 2013.
[139] Attila Reiss, Markus Weber, and Didier Stricker. Exploring and extending the
boundaries of physical activity recognition. In Proceedings of 2011 IEEE In-
ternational Conference on Systems, Man and Cybernetics (SMC), Workshop on
170 Bibliography
[140] Attila Reiss, Ilias Lamprinos, and Didier Stricker. An integrated mobile system
for long-term aerobic activity monitoring and support in daily life. In Proceed-
ings of 2012 International Symposium on Advances in Ubiquitous Computing
and Networking (AUCN), Liverpool, UK, June 2012.
[141] Attila Reiss, Gustaf Hendeby, and Didier Stricker. A competitive approach
for human activity recognition on smartphones. In Proceedings of 21st Euro-
pean Symposium on Artificial Neural Networks, Computational Intelligence
and Machine Learning (ESANN), Bruges, Belgium, April 2013.
[142] Attila Reiss, Gustaf Hendeby, and Didier Stricker. Towards robust activity
recognition for everyday life: methods and evaluation. In Proceedings of 7th
International Conference on Pervasive Computing Technologies for Healthcare
(PervasiveHealth), Venice, Italy, May 2013.
[143] Attila Reiss, Gustaf Hendeby, and Didier Stricker. Confidence-based multiclass
AdaBoost for physical activity monitoring. In Proceedings of IEEE 17th In-
ternational Symposium on Wearable Computers (ISWC), Zurich, Switzerland,
September 2013.
[144] Daniel Roggen, Alberto Calatroni, Mirco Rossi, Thomas Holleczek, Kilian
Förster, Gerhard Tröster, Paul Lukowicz, David Bannach, Gerald Pirkl, Alois
Ferscha, Jacob Doppler, Clemens Holzmann, Marc Kurz, Gerald Holl, Ricardo
Chavarriaga, Hesam Sagha, Hamidreza Bayati, Marco Creature, and José del R.
Millán. Collecting complex activity datasets in highly rich networked sensor
environments. In Proceedings of 7th International Conference on Networked
Sensing Systems (INSS), pages 233–240, Kassel, Germany, June 2010.
[145] Daniel Roggen, Stephane Magnenat, Markus Waibel, and Gerhard Tröster.
Wearable computing: designing and sharing activity recognition systems
across platforms. IEEE Robotics and Automation Magazine, 18(2):83–95, June
2011.
[146] Mirco Rossi, Gerhard Tröster, and Oliver Amft. Recognizing daily life con-
text using web-collected audio data. In Proceedings of IEEE 16th International
Symposium on Wearable Computers (ISWC), pages 25–28, Newcastle, UK, June
2012.
[147] Stuart J. Russell and Peter Norvig. Artificial Intelligence: a modern approach.
Prentice Hall, Englewood Cliffs, 2010.
[148] Maytal Saar-Tsechansky and Foster Provost. Handling missing values when
applying classification models. Journal of Machine Learning Research, 8:1625–
1657, December 2007.
Bibliography 171
[149] Hesam Sagha, Sundara Tejaswi Digumarti, Ricardo Chavarriaga, Alberto Cala-
troni, Daniel Roggen, and Gerhard Tröster. Benchmarking classification tech-
niques using the Opportunity human activity dataset. In Proceedings of 2011
IEEE International Conference on Systems, Man and Cybernetics (SMC), Work-
shop on Robust Machine Learning Techniques for Human Activity Recognition,
pages 36–40, Anchorage, AK, USA, October 2011.
[151] Ralf Salomon, Marian Lüder, and Gerald Bieber. iFall - a new embedded sys-
tem for the detection of unexpected falls. In Proceedings of 8th IEEE Inter-
national Conference on Pervasive Computing and Communications (PerCom),
pages 286–291, Mannheim, Germany, March-April 2010.
[152] Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):
197–227, June 1990.
[153] Robert E. Schapire. Using output codes to boost multiclass learning problems.
In Proceedings of 14th International Conference on Machine Learning (ICML),
pages 313–321, Nashville, TN, USA, July 1997.
[155] Robert E. Schapire and Yoram Singer. Improved boosting algorithms using
confidence-rated predictions. Machine learning, 37(3):297–336, December
1999.
[156] Bill N. Schilit, Norman Adams, and Roy Want. Context-aware computing appli-
cations. In Proceedings of First Workshop on Mobile Computing Systems and
Applications (WMCSA), pages 85–90, Santa Cruz, CA, USA, December 1994.
[157] Markus Scholz, Stephan Sigg, Gerrit Bagschik, Toni Guenther, Georg von
Zengen, Dimana Shishkova, Yusheng Ji, and Michael Beigl. SenseWaves: ra-
diowaves for context recognition. In Proceedings of 9th International Con-
ference on Pervasive Computing (PERVASIVE), San Francisco, CA, USA, June
2011.
[158] Markus Scholz, Stephan Sigg, Hedda R. Schmidtke, and Michael Beigl. Chal-
lenges for device-free radio-based activity recognition. In Proceedings of 8th
International ICST Conference on Mobile and Ubiquitous Systems (MobiQui-
tous), 3rd Workshop on Context-Systems Design, Evaluation and Optimisation
(CosDEO), Copenhagen, Denmark, December 2011.
172 Bibliography
[159] Burr Settles. Active learning literature survey. Technical Report 1648, Com-
puter Sciences, University of Wisconsic-Madison, 2009.
[161] Hua Si, Seung Jin Kim, Nao Kawanishi, and Hiroyuki Morikawa. A context-
aware reminding system for daily activities of dementia patients. In Pro-
ceedings of 27th International Conference on Distributed Computing Systems
Workshops (ICDCS), Toronto, ON, Canada, June 2007.
[162] J. Paul Siebert. Vehicle recognition using rule based methods. Turing Institute
(Glasgow, Scotland), 1987.
[164] Claudio De Stefano, Antonio Della Cioppa, and Angelo Marcelli. An adaptive
weighted majority vote rule for combining multiple classifiers. In Proceedings
of 16th International Conference on Pattern Recognition (ICPR), pages 192–
195, Quebec, QC, Canada, August 2002.
[165] Thomas Stiefmeier, Daniel Roggen, Georg Ogris, Paul Lukowicz, and Gerhard
Tröster. Wearable activity tracking in car manufacturing. IEEE Pervasive Com-
puting, 7(2):42–50, April 2008.
[166] Maja Stikic, Tâm Huynh, Kristof van Laerhoven, and Bernt Schiele. ADL
recognition based on the combination of RFID and accelerometer sensing. In
Proceedings of 2nd International Conference on Pervasive Computing Tech-
nologies for Healthcare (PervasiveHealth), pages 258–263, Tampere, Finland,
January-February 2008.
[167] Johannes A. Stork, Luciano Spinello, Jens Silva, and Kai O. Arras. Audio-based
human activity recognition using non-Markovian ensemble voting. In Pro-
ceedings of IEEE International Symposium on Robot and Human Interactive
Communication (RO-MAN), pages 509–514, Paris, France, September 2012.
[168] Christina Strohrmann, Holger Harms, and Gerhard Tröster. What do sensors
know about your running performance? In Proceedings of IEEE 15th Interna-
tional Symposium on Wearable Computers (ISWC), pages 101–104, San Fran-
cisco, CA, USA, June 2011.
[169] Amarnag Subramanya, Alvin Raj, Je Bilmes, and Dieter Fox. Recognizing activ-
ities and spatial context using wearable sensors. In Proceedings of 22nd Con-
ference on Uncertainty in Artificial Intelligence (UAI), Cambridge, MA, USA,
July 2006.
Bibliography 173
[170] Feng-Tso Sun, Cynthia Kuo, Heng-Tze Cheng, Senaka Buthpitiya, Patricia
Collins, and Martin L. Griss. Activity-aware mental stress detection using phys-
iological signals. In Proceedings of 2nd International ICST Conference on Mo-
bile Computing, Applications, and Services (MobiCASE), pages 211–230, Santa
Clara, CA, USA, October 2010.
[171] Xu Sun, Hisashi Kashima, Ryota Tomioka, Naonori Ueda, and Ping Li. A new
multi-task learning method for personalized activity recognition. In Proceed-
ings of IEEE 11th International Conference on Data Mining (ICDM), pages
1218–1223, Vancouver, BC, Canada, December 2011.
[172] Sudeep Sundaram and Walterio W. Mayol Cuevas. High level activity recogni-
tion using low resolution wearable vision. In Proceedings of IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR),
Workshop on Egocentric Vision, pages 25–32, Miami, FL, USA, June 2009.
[173] Emmanuel Munguia Tapia, Stephen S. Intille, William Haskell, Kent Larson,
Julie Wright, Abby King, and Robert Friedman. Real-time recognition of physi-
cal activities and their intensities using wireless accelerometers and a heart rate
monitor. In Proceedings of IEEE 11th International Symposium on Wearable
Computers (ISWC), pages 1–4, Boston, MA, USA, October 2007.
[174] Moritz Tenorth, Jan Bandouch, and Michael Beetz. The TUM Kitchen data
set of everyday manipulation activities for motion tracking and action recog-
nition. In Proceedings of IEEE 12th International Conference on Computer
Vision (ICCV), Workshop on Tracking Humans for the Evaluation of Their Mo-
tion in Image Sequences (THEMIS), Kyoto, Japan, September-October 2009.
[177] Dorra Trabelsi, Samer Mohammed, Faicel Chamroukhi, Latifa Oukhellou, and
Yacine Amirat. Supervised and unsupervised classification approaches for hu-
man activity recognition using body-mounted sensors. In Proceedings of 20th
European Symposium on Artificial Neural Networks, Computational Intelli-
gence and Machine Learning (ESANN), pages 417–422, Bruges, Belgium, April
2012.
[179] Wallace Ugulino, Debora Cardador, Katia Vega, Eduardo Velloso, Ruy Milidiú,
and Hugo Fuks. Wearable computing: accelerometers’ data classification of
body postures and movements. In Proceedings of 21st Brazilian Symposium
on Artificial Intelligence (SBIA), pages 52–61, Curitiba, Brazil, October 2012.
174 Bibliography
[180] Tim L.M. van Kasteren, Athanasios Noulas, Gwenn Englebienne, and Ben
Kröse. Accurate activity recognition in a home setting. In Proceedings of
10th International Conference on Ubiquitous Computing (UbiComp), pages
1–9, Seoul, South Korea, September 2008.
[181] Tim L.M. van Kasteren, Hande Alemdar, and Cem Ersoy. Effective performance
metrics for evaluating activity recognition methods. In Proceedings of 24th In-
ternational Conference on Architecture of Computing Systems (ARCS), Como,
Italy, February 2011.
[182] Tim L.M. van Kasteren, Gwenn Englebienne, and Ben Kröse. Hierarchical ac-
tivity recognition using automatically clustered actions. In Proceedings of 2nd
International Conference on Ambient Intelligence (AmI), pages 82–91, Amster-
dam, Netherlands, November 2011.
[183] Kristof van Laerhoven, Marko Borazio, David Kilian, and Bernt Schiele. Sus-
tained logging and discrimination of sleep postures with low-level, wrist-worn
sensors. In Proceedings of IEEE 12th International Symposium on Wearable
Computers (ISWC), pages 69–76, Pittsburgh, PA, USA, September-October
2008.
[186] Elena Villalba, Manuel Ottaviano, María Teresa Arredondo, A. Martinez, and
S. Guillen. Wearable monitoring system for heart failure assessment in a mobile
environment. Computers in Cardiology, 33:237–240, 2006.
[187] Matteo Voleno, Stephen J. Redmond, Sergio Cerutti, and Nigel H. Lovell. En-
ergy expenditure estimation using triaxial accelerometry and barometric pres-
sure measurement. In Proceedings of 32nd Annual International IEEE EMBS
Conference, pages 5185–5188, Buenos Aires, Argentina, August-September
2010.
[188] Jamie A. Ward, Paul Lukowicz, Gerhard Tröster, and Thad E. Starner. Activity
recognition of assembly tasks using body-worn microphones and accelerome-
ters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):
1553–1567, October 2006.
[189] Jamie A. Ward, Paul Lukowicz, and Hans W. Gellersen. Performance metrics for
activity recognition. ACM Transactions on Intelligent Systems and Technology,
2(1), January 2011. Article No. 6.
Bibliography 175
[201] Zhongtang Zhao, Yiqiang Chen, Junfa Liu, Zhiqi Shen, and Mingjie Liu. Cross-
people mobile-phone based activity recognition. In Proceedings of 22nd Inter-
national Joint Conference on Artificial Intelligence (IJCAI), pages 2545–2550,
Barcelona, Spain, July 2011.
[202] Ji Zhu, Saharon Rosset, Hui Zou, and Trevor Hastie. Multi-class adaboost. Tech-
nical Report 430, Department of Statistics, University of Michigan, 2005.
[203] Ji Zhu, Hui Zou, Saharon Rosset, and Trevor Hastie. Multi-class Adaboost.
Statistics and Its Interface, 2:349–360, 2009.
[205] Andreas Zinnen, Ulf Blanke, and Bernt Schiele. An analysis of sensor-oriented
vs. model-based activity recognition. In Proceedings of IEEE 13th Interna-
tional Symposium on Wearable Computers (ISWC), pages 93–100, Linz, Aus-
tria, September 2009.
Curriculum Vitæ
Education
Jan 2014 Doctor of Engineering, Department of Computer Science,
Technical University of Kaiserslautern
Thesis: Personalized Mobile Physical Activity Monitoring for
Everyday Life
Professional Experience
Feb 2009 - Dec 2013 Researcher, Department of Augmented Vision,
German Research Center for Artificial Intelligence
(DFKI), Kaiserslautern, Germany
Feb 2009 - Jan 2011 Software engineer (50%), Rittal GmbH & Co. KG,
Herborn, Germany
Attila Reiss
Kaiserslautern, 10 January 2014.