Annals of Mathematics and Artificial Intelligence (2006) 46: 349–374
DOI: 10.1007/s10472-006-9022-5
# Springer 2006
Annals of Mathematics and Artificial Intelligence, Volume 46, Number 3, pp 349-374, March 2006.
URL: http://www.springerlink.com/link.asp?id=05t1276t2137143u
The original publication is available at springerlink.com
Copyright © Springer-Verlag
An algebraic framework for temporal attribute
characteristics
Michael Böhlen a, Johann Gamper a and Christian S. Jensen b
a
Faculty of Computer Science, Free University of Bozen-Bolzano, Dominikanerplatz 3,
39100 Bolzano, Italy
E-mail: {boehlen; gamper}@inf.unibz.it
b
Aalborg University, Aalborg, Denmark
E-mail:
[email protected]
Received 24 January 2005; Revised 15 September 2005; Accepted 14 November 2005
Most real-world database applications manage temporal data, i.e., data with associated
time references that capture a temporal aspect of the data, typically either when the data is
valid or when the data is known. Such applications abound in, e.g., the financial, medical,
and scientific domains. In contrast to this, current database management systems offer
preciously little built-in query language support for temporal data management. This
situation persists although an active temporal database research community has
demonstrated that application development can be simplified substantially by built-in
temporal support. This paper’s contribution is motivated by the observation that existing
temporal data models and query languages generally make the same rigid assumption about
the semantics of the association of data and time, namely that if a subset of the time domain
is associated with some data then this implies the association of any further subset with the
data. This paper offers a comprehensive, general framework where alternative semantics
may co-exist. It supports so-called malleable and atomic temporal associations, in addition
to the conventional ones mentioned above, which are termed constant. To demonstrate the
utility of the framework, the paper defines a characteristics-enabled temporal algebra,
termed CETA, which defines the traditional relational operators in the new framework. This
contribution demonstrates that it is possible to provide built-in temporal support while
making less rigid assumptions about the data and without jeopardizing the degree of the
support. This moves temporal support closer to practical applications.
Keywords: temporal databases, temporal algebra, attribute characteristics, malleable
attributes, temporal data semantics
AMS subject classification: 68P15
1.
Introduction
Time-referenced, or temporal, data abounds in contemporary database management applications. Due perhaps to the ubiquity of temporal data and evident
difficulties in managing such data using presently known concepts and techniques,
350
Böhlen et al. / An algebraic framework for temporal attribute characteristics
several research communities have studied the management of temporal data quite
extensively, for the last several decades [1, 4, 7, 24, 25, 29].
The artificial intelligence research community has devoted considerable efforts to
the study of the modeling of reality using different types of temporal entities or
propositions, and this community distinguishes among such concepts as, e.g.,
properties, events, and processes [5]. However, it has not been a main concern to
integrate these concepts into the data models and query languages of the contemporary
database management systems.
In contrast, the database research community has adopted a more systems-oriented
approach and has favored quite rigid assumptions about the semantics of temporal data
[12, 18, 21, 23]. In particular, strict point-based and snapshot-equivalence-based
semantics have been favored. With these semantics, two temporal relations are
identical if their pairs of snapshots at all points in time are identical. It is typically
assumed that if some data is associated with a subset of the time domain then it is also
associated with any subset of this subset. For example, if an employee is assigned to a
project during some week, it is assumed to also be true that the employee is assigned
to the project during, say, the Tuesday of that week. This assumption enables the
design of relatively simple temporal data models and query languages that offer
advanced built-in support for managing temporal data. However, this assumption also
limits the generality and applicability of the resulting models and query languages.
This paper aims to generalize temporal data models and query languages, thus
broadening their applicability; and it aims to do so without sacrificing the degree of
built-in support. In particular, this paper proposes a framework where different
semantics may be assigned to the association of data with time values. For example,
assume that our employee works for 40 h during some week. This does not mean that
it is true that the employee worked for 40 h during Tuesday of that week. Rather, we
would expect that the employee worked for perhaps 8 h during each workday. The
framework proposed here accommodates a range of semantics such as this one. We
will refer to these different semantics by saying that the attributes have different
characteristics.
The framework is based on the relational model and the relational algebra. An
algebraic framework is chosen to avoid issues to do with syntax and language design,
instead enabling a focus on semantics. The framework satisfies basic properties and
permits various temporal semantics. In the framework, where the time domain is
assumed to be discrete, timestamps in the form of time intervals, i.e., convex sets of
time points, are used for capturing the temporal aspects of data.
The framework enables operations on data to respect, in a precise technical
sense, the grouping of time points into timestamps. More precisely, the grouping of
time points in the timestamps of the tuples that result from algebraic operations is
unique: Timestamps are maximal, while preserving the original grouping of time
points in the argument tuples.
With our framework, the definition of a temporal algebra is carried out in three
steps. The first step is to define the minimal requirements. The minimal requirements
Böhlen et al. / An algebraic framework for temporal attribute characteristics
351
are defined for each algebraic operator; they specify the output tuples together with
associated sets of time points, but they do not group the time points into timestamps.
As an example, we define the minimal requirements for the algebraic operators of a
temporal algebra, CETA.
The second step groups the time points of the output tuples into timestamps and
generates for each possible timestamp a candidate result tuple. For each candidate
result tuple we determine its lineage, i.e., the minimal set of input tuples that are
required to derive it. These input tuples are called required argument tuples.
The third step considers the lineage information and selects those candidate
result tuples, called maximal output tuples, that minimize the required argument tuples
and respect the original grouping of time points as much as possible, i.e., form
maximal intervals. These intervals are the timestamps of the result tuples. We also
adjust the attribute values of the result tuples according to the characteristics of their
attributes to the new timestamp. This is done independently of the specific operators.
This three-step process limits the complexity of the definitions of algebraic
operators, since the minimal requirements, which are defined for each algebraic
operator, do not specify the grouping of time points into timestamps. The framework
generalizes the scope of applicability of built-in support for temporal data in existing
data models and query languages. It brings temporal data management concepts closer
to being supported in contemporary database management systems, which currently
offer only very limited built-in support for managing temporal data.
The remainder of the paper is structured as follows: section 2 covers related
work. Section 3 motivates the proposed framework and also introduces an example to
be used throughout the paper. Section 4 then proceeds to provide basic definitions and
notational conventions. Section 5 covers the adjustment of attribute values according
to the characteristics of their attributes. The algebraic aspects of the framework are
then presented in section 6, and section 7 presents key properties of the framework.
Section 8 applies the framework, defining the characteristics-enabled temporal
algebra, CETA. The paper ends with conclusions and an outlook about future work.
2.
Related work
Different research communities have investigated various aspects of timevarying information [6, 8, 15, 27, 31]. This paper falls within the area of temporal
database systems that strives to provide support for the management of temporal data
[22, 28, 34]. Most temporal database systems assume a point-based semantics that
permits attributes with a constant characteristics only. By also taking into
consideration the grouping of time points, we significantly broaden the scope of
applications of temporal relational algebras [19, 26, 30, 33] and, by implication,
temporal database management systems. This work extends our initial work on
algebras that combine point and interval properties [10, 13]. The framework presented
in this paper can be used to extend any point-based algebra to provide a unique
352
Böhlen et al. / An algebraic framework for temporal attribute characteristics
grouping of time points that is based on the original grouping of time points in the
argument relations. We discuss how to add support for malleable and atomic attribute
characteristics to a temporal algebra.
Terenziani and Snodgrass [31] compare the point-based database approach to
related work in linguistics, philosophy, and artificial intelligence, where the different
semantics that may be tied to time interval associations with data have received more
attention. They argue that it is necessary to distinguish between telic and atelic facts.
An atelic relation is used if truth values are associated with time points, and a telic
relation is used if truth values are associated with time intervals. They emphasize that
the distinction between telic and atelic facts is not rigid: Both are needed, and it must
be possible to convert between the two. They propose a three-sorted temporal relational
model, together with coercion functions to map between atelic and telic relations. Thus,
they have three different types of relations (atelic, telic, and nontemporal) together with
corresponding algebras. Their work focuses on the semantics of temporal relations and
statements. In contrast, we are interested in providing a foundation for the flexible and
non-constraining support that temporal database systems should offer, to enable them
to facilitate applications with different needs.
Assuming a fixed granularity, the malleable characteristic with the uniformity
assumption resembles the constant characteristic. The uniformity assumption makes it
possible to normalize malleable values to the length of a chronon, and then treat them
as constant values. However, a normalization renders the data unsuitable for direct use
by applications. For example, it would be rather cumbersome if applications had to
deal with normalized grants or salaries (funding or salary per second). Further, the
kind of normalization that would be needed impacts many aspects of a database
management system, e.g., indexing, and it is unclear whether normalization is
practical. Also, if the malleable characteristic is refined further (e.g., to use nonuniform distributions), normalization is no longer applicable. Finally, even if all data
were normalized, it might still be relevant for some applications to preserve the
original grouping of time points [10, 11].
The lineage of tuples has previously been defined and used for the maintenance
of data warehouse views [16, 17]. In our framework, the required argument tuples
capture the lineage for temporal algebraic operators, and this lineage is used for the
grouping of time points into timestamps of result tuples according to their original
grouping in the argument relations.
3.
Motivation
To motivate the proposed framework, we proceed to consider concrete examples
of different temporal semantics. This section’s coverage is informal Y the remainder of
the paper offers a formal coverage.
As suggested briefly in the introduction, it is appropriate to apply the different
temporal semantics of data at the level of attributes in our relational context. In
Böhlen et al. / An algebraic framework for temporal attribute characteristics
353
particular, the associations of the values of different attributes in the same relation
with time values may have different semantics. To illustrate these different semantics,
or attribute characteristics, we consider constant, malleable, and atomic attributes.
An attribute is constant if a value of this attribute, associated with some time
value, termed a timestamp, may also be associated, unchanged, with a modified
timestamp. The literature also uses the terms atelic [31] and homogeneous [5] for such
attributes. It is instructive to consider an example.
Consider a project database where the relation EMPL stores information about
employments, including the name of an employee (N), a contract identifier (CI), the
name of a project (P), the hours an employee is assigned to a project (H), and the valid
time (T) over which a tuple holds true. See figure 1.
We assign the constant characteristic to attributes Name and Project. This means
that the values of these attributes also apply when their associated timestamps are
transformed. Suppose that the timestamp ½1; 3 of the first tuple is split into ½1; 2 and
½3; 3. Then values Jan and P1 also hold for the new timestamps, i.e., Jan works for P1
during both intervals.
Next, an attribute is malleable if its values change if the associated timestamps
change. In the example, we assign the malleable characteristic to attribute Hours. This
means that a value of this attribute does not also hold for proper subsets of its timestamp.
In general, additional information is needed in order to determine an appropriate value
of a malleable attribute when a timestamp is modified. In particular, information is
needed on how to inter- or extrapolate a malleable attribute value. For simplicity, we
assume a uniform distribution as a default. Thus, if a timestamp is transformed, the
attribute value is transformed proportionally. Using the first tuple in the example, the
Hours value is adjusted to 200 for the interval ½1; 2 and to 100 for the interval ½3; 3.
Finally, an attribute is atomic if its values cannot be associated with modified
timestamps. An atomic attribute roughly corresponds to Allen’s notion of an event [5].
For example, chemotherapies used in cancer treatment often prescribe a specific
amount of drugs to be taken over a specific time period. Neither the amount of drugs
nor the time period can be modified to prevent wrong prescriptions.
{N/Jan,CI/137,P/P1,H/300,T/[1,3]}
EMPL
N
Jan
Tom
Tom
Ann
Tom
Jan
{N/Tom,CI/138,P/P1,H/400,T/[1,4]}
CI
137
138
147
139
140
142
P
P1
P1
P1
P1
P2
P2
H
300
400
400
300
200
600
T
[1,3]
[1,4]
[5,6]
[1,6]
[2,3]
[2,5]
{N/Tom,CI/147,P/P1,H/400,T/[5,6]}
{N/Ann,CI/139,P/P1,H/300,T/[1,6]}
{N/Tom,CI/140,P/P2,H/200,T/[2,3]}
{N/Jan,CI/142,P/P2,H/600,[2,5]}
1
2
3
Figure 1. A project database.
4
5
6
354
Böhlen et al. / An algebraic framework for temporal attribute characteristics
Note that almost all temporal data models proposed by the temporal database
community assume that all attributes have the constant characteristic. This applies to
the so-called point-based data models [13, 32] and to models where snapshot
equivalent [20, 23] relations are treated as being identical.
A framework that encompasses malleable and atomic attributes must support the
following functionality:
Malleable attributes require additional consideration when timestamps are
modified. Additional information is needed as to how to inter- or extrapolate
malleable attribute values. We will assume a uniform distribution as a default.
Thus, if the timestamp is transformed, the attribute value has to be transformed
proportionally.
Note also that malleable attributes cannot be coalesced in the traditional sense [11].
Thus, two tuples with pairwise identical attribute values and adjacent timestamps
cannot be consolidated into a single tuple with the union of the adjacent timestamps
as its new timestamp. For example, the projection of relation EMPL on attributes
Name, Project, and Hours yields two tuples that, apart from attribute T, have
pairwise identical attribute values, i.e., fN=Tom; P=P1; H=400; T=½1; 4g and
fN=Tom; P=P1; H=400; T=½5; 6g. Coalescing these two tuples leads to the wrong
result that Tom worked for 400 h in the interval ½1; 6. The reason is that the Hours
attribute is malleable.
Timestamps associated with atomic attribute values cannot be modified at all.
Atomic attributes represent facts or events that are accomplished over a specific
time interval. Hence, the time interval cannot be changed as we have seen in the
chemotherapy example.
The characteristic of an attribute is subject to change. We do not make any a priori
assumptions about the characteristics of attributes. Rather, the characteristic of an
attribute may vary depending on the context or the specific operation being
performed on the data. For instance, in the chemotherapy example, an atomic
characteristic is useful as a conservative default. At the same time, a medical doctor
should have the possibility of applying the constant or malleable characteristics
when querying or updating treatments.
We will use constant, atomic, and malleable characteristics with the uniformity
assumption as representative attribute characteristics as we discuss the handling of
different attribute characteristics.
4.
Preliminaries
Next, we introduce basic definitions and notational conventions to be used in the
paper. Specifically, we define a time domain, (temporal) relation schemas, (temporal)
tuples, and (temporal) relations in turn.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
355
Definition 4.1 (Time Domain). A time domain, DT , is a discrete, totally ordered
domain. The elements are termed chronons (or time points), and the ordering relation
is given as <T .
The natural numbers with the relation < satisfy these requirements and we use
them as our time domain throughout. A timestamp is a convex set with respect to <,
i.e., a time interval over this domain, and is represented by two chronons, denoting its
inclusive starting and ending points, respectively. For example, ½1; 7 represents the
timestamp which starts with time point 1 and ends with time point 7. For timestamps,
we introduce membership and subset relations: t 2 T means that chronon t is included
in timestamp T. For two timestamps T and T 0 , T 0 T iff all chronons in T 0 are also
included in T.
A relation schema is a three-tuple S ¼ ð ; ; domÞ, where
is a non-empty,
finite set of attributes, is a finite set of domains, and dom : ! is a function that
associates a domain with each attribute. A temporal relation schema is a relation
schema with at least one time attribute, i.e., an attribute over domain DT , where
DT 2 . For the purpose of this paper, we define the temporal relation schema
R ¼ fA1 ; . . . ; Ak ; Tg. Note that the assumption that each relation has a timestamp
attribute T is just for convenience. There is no implicit timestamp attribute, and all
definitions are parameterized with a timestamp attribute. The rename operator can be
used to adjust schemas as appropriate. For example, in binary operations a timestamp
attribute in one of the argument relations can be renamed so that the argument
relations have a common timestamp attribute.
A tuple over schema S ¼ ð ; ; domÞ is a function x : ! [2 , such that for
every attribute A of , xðAÞ 2 domðAÞ. A tuple is temporal iff its schema is temporal.
We will represent a tuple as a set of attribute/value pairs:
x ¼ fA1 =v1 ; . . . ; Ak =vk ; T=tg
A relation over schema S is a finite set of tuples over S, denoted as r. If S is a
temporal schema, r is called a temporal relation.
To simplify notation and improve readability, we introduce a few abbreviations.
For a tuple x and a set of attributes X, we write x½X for the projection of the tuple on
the attributes in X, i.e., x½X ¼ fA=v j A 2 X ^ A=v 2 xg. For the tuple x ¼ fN=Jan;
CI=137; P=P1; H=300; T=½1; 3g, the projection x½fN; Hg yields fN=Jan; H=300g. For
a tuple x and an attribute A we write x:A to denote the value of the attribute A in x, i.e.,
x:A ¼ v iff A=v 2 x. In the previous example we have x:N ¼ Jan. A vector of relations
r. We define the subset relation ~
s ~
r as follows: ~
s ~
r
hr1 ; . . . ; rl i will be denoted as ~
s ~
r iff s1 r1 ; . . . ; sl rl , and si ri for at least
iff s1 r1 ; . . . ; sl rl . Similarly, ~
one i.
5.
Support for multiple attribute characteristics
To be able to support multiple characteristics, we first augment the definition of a
temporal relation to also capture attribute characteristics.
356
Böhlen et al. / An algebraic framework for temporal attribute characteristics
Definition 5.1 (Attribute Characteristics). The attribute characteristics of a temporal
relation are given by a function C : n fTg ! fa; c; mg.
The attribute characteristics of a relation with schema fA1 ; . . . ; Ak ; Tg are then
given as C ¼ fA1 :c1 ; . . . ; Ak :ck g, where ci are the attribute characteristics. We use m,
a, and c to denote malleable, atomic, and constant characteristics, respectively. For
example, H:m means that the Hours attribute has the malleable characteristic.
The tuples in the result of a query,1 termed output tuples, are derived from the
tuples in the argument relations, termed the input tuples. Each output tuple is generally
shaped by just a subset of the input tuples. We denote the subset of input tuples that affect
the output tuples the required argument tuples (to be defined formally in section 6).
Exactly the required argument tuples have to be considered when calculating an
output tuple of an algebraic operator. For example, in the case of a temporal selection,
the result are all those input tuples that satisfy the selection predicate. Each output
tuple is determined by one required argument tuple Y the one it is identical to.
For other operators, the situation is more complex. Consider a binary temporal
join: Here, an output tuple is produced for each pair of input tuples with timestamps
that intersect. Output tuples have one timestamp attribute, whose timestamp is equal
to the intersection of the timestamps of the input tuples (cf. Definition 8.1). In the case
of the join, the output timestamp is different from the timestamps of the input tuples,
and the non-timestamp attribute values may have to be adjusted.
We proceed to define how the attribute values of a tuple are adjusted according
the characteristics of the attributes. Intuitively, we start out with a temporal tuple
fA1 =v1 ; . . . ; Ak =vk ; T=Ig. The definition then adjusts each non-timestamp attribute
value of the tuple in turn and returns an adjusted tuple. Explanation and examples
follow the definition. Another instance from the literature of adjusting attribute values
is the coercion function introduced by Camossi et al. [14].
Definition 5.2 (Adjustment of Attribute Values). Let ~z ¼ fA1 =v1 ; . . . ; Ak =vk g be a set
of attribute/value pairs, let I be a timestamp, let C be the attribute characteristics, let f
be an aggregate function, and let ~
s ¼ hs1 ; . . . ; sl i be the required argument tuples of
~z [ fT=Ig. The adjustment of attribute values is defined as follows:
adjð~
z; I;~
s; CÞ
¼ fevalðA1 =v1 ; I; S; CÞ; . . . ; evalðAk =vk ; I; S; CÞg; where S ¼ s1 ] . . . ] sl
evalðw; I; S; CÞ
8
A=v
iff w ¼ A=v ^ A: c 2 C
>
>
>
0
>
A=ðv
jIj=jI
jÞ
iff
w ¼ A=v ^ A: m 2 C ^ 9y 2 SðA=v 2 y ^ I 0 ¼ y:TÞ
>
*
>
>
<A=v
iff w ¼ A=v ^ A:a 2 C ^ 9y 2 SðA=v 2 y ^ I ¼ y:TÞ
¼
A=UNDEF
iff
w ¼ A=v ^ A: a 2 C ^ 9y 2 SðA=v 2 y ^ I 6¼ y:TÞ
>
>
>
>
0
>
A =f ðffv j y 2 S ^ A=v ¼ evalðy½A; I; y; CÞggÞ
>
>
:
iff
w ¼ A0 =f ðffA=v1 ; . . . ; A=vq ggÞ ^ S ¼ ffy1 ; . . . ; yq gg :
1
Queries are relational algebra expression, e.g., Example 8.1.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
357
To simplify the notation in the evaluation function, we replace~
s ¼ hs1 ; . . . ; sl i by
a (heterogeneous) bag S.2 The value UNDEF belongs to each domain in and means
that the function is not defined.
The first argument of eval is a single attribute/value pair. The function eval then
uses timestamp I, the required argument tuples S, and the attribute characteristic of the
attribute/value pair for adjusting the attribute value. In the case of a constant attribute,
the attribute’s value is not changed at all. For a malleable attribute, the attribute’s
value is adjusted proportionally according to the new interval I. For an atomic
attribute, the intervals I and I 0 must be identical, and the attribute value is not changed.
Otherwise, the value of an atomic attribute is undefined over I, represented by the
unique constant UNDEF.
For aggregations we have to adjust and aggregate q values. The attribute/value
pairs are collected in the bag of the attribute values to be adjusted. Additionally, the
attribute/value pairs are also included in the required argument tuples, where they are
available together with the associated timestamps. Since the timestamps are needed to
adjust the values, the attribute/value pairs are extracted from the required argument
tuples and adjusted before the aggregation function is evaluated.
Example 5.1 Consider the relation in figure 1. We want to sum up the working hours
per project. With attribute Hours being malleable and the remaining attributes being
constant, we expect the following tuple to belong to the result set:
fP=P1; H=750; T=½1; 3g
This result tuple depends on three input tuples (no. 1, 2, and 4) of relation EMPL. These
tuples must be adjusted before they are used in the aggregate function. We call the adj
function as follows:
adjðfP=P1; SH=sumffH=300; H=400; H=300ggg;
½1; 3;
hffN=Jan; P=P1; H=300; T=½1; 3g;
fN=Tom; P=P1; H=400; T=½1; 4g;
fN=Ann; P=P1; H=300; T=½1; 6ggi;
fP:c; H:mgÞ
¼ fP=P1; SH=750g
In the first step the eval function is pushed into the tuple which is evaluated. Since P is
constant, we get evalðP=P1; . . .Þ ¼ P=P1 for the first attribute. For the evaluation of
2
To avoid attribute name clashes in S, we use qualified attribute names, R:A, and rename attributes as
appropriate.
358
Böhlen et al. / An algebraic framework for temporal attribute characteristics
the second attribute, evalðSH=sumðffH=300; H=400; H=300ggÞ; . . .Þ, the eval function
is applied to the aggregate attribute of each required argument tuple:
evalðH=300; ½1; 3; fN=Jan; P=P1; H=300; T=½1; 3g; fP:c; H:mgÞ ¼ H=300
evalðH=400; ½1; 3; fN=Jan; P=P1; H=400; T=½1; 4g; fP:c; H:mgÞ ¼ H=300
evalðH=300; ½1; 3; fN=Ann; P=P1; H=300; T=½1; 6g; fP:c; H:mgÞ ¼ H=150
The bag of adjusted attribute values is finally evaluated by the aggregation function,
i.e., sumðff300; 300; 150ggÞ ¼ 750, yielding SH=750 as the final result of the adj
function.
Example 5.2 Assume the chemotherapy example and the tuple fN=Jan; D=310;
T=½1; 6g that prescribes a dosage of 310 over a six-day period to Jan. Assume a query
that wants to retrieve information about the treatment during day 1.
adjðfN=Jan; D=310g;
½1; 1;
hffN=Jan; D=310; T=½1; 6ggi;
fN:c; D:agÞ
¼ fN=Jan; D=UNDEFg
Since D has the atomic characteristic and the input and output intervals are different,
undefined is returned for the dosage attribute.
The definition of the evaluation function is flexible and allows the modification
of attribute values if the timestamp is reduced or if the timestamp is extended. For
example, if we want to estimate the hours Jan is working in a specific project based on
his hours in a shorter period, we have to increase the value proportionally to the
increase of the timestamp. It is not necessary that the new timestamp overlaps the old
one. An example of a completely disjoint timestamp is the extrapolation of quarterly
numbers of a company from one year to another.
6.
An algebraic framework for temporal attribute characteristics
We proceed to describe how attribute characteristics are to be taken into account
in algebraic query languages on temporal relations with multiple attribute characteristics. Particular attention is given to the description of how the grouping of time
points in the timestamps of input tuples affects the grouping of time points in the
timestamps of the output tuples.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
359
The overall objective of the algebraic framework is to provide built-in support
for handling multiple attribute characteristics in a temporal setting. Specifically, the
framework supports the adjustment of attribute values if timestamps change and it
preserves the grouping of time points as defined by the input tuples. The definition of
the framework proceeds in the following steps:
1. For each possible (unadjusted) result tuple, determine the set of time points that
should be associated with that tuple (! minimal requirements).
2. Determine all minimal sets of argument tuples that completely determine a convex
subset of time points of each such result tuple (! required argument tuples).
3. Timestamp result tuples with the same required argument tuples with maximal
subsets of convex time points. Use the timestamps together with the associated
argument tuples and the attribute characteristics to adjust the attribute values of the
result tuples by applying the adj function from the previous section (! maximal
output tuples).
These steps will be formalized in the following definitions.
6.1. Minimal requirements
Informally, the minimal requirements determine, for each (unadjusted) output
tuple of an algebraic operator, the set of time points that must be included in one or
more timestamps to be associated with the output tuple.
Definition 6.1 (Minimal Requirements, MR). Let ~
r ¼ hr1 ; . . . ; rl i be argument
relations with timestamp attribute T, ~
z be a set of attribute/value pairs of an output
tuple modulo the adjustment of the attribute values, and E be a set of (not necessarily
connected) time points associated with ~
z. The minimal requirements for an l-ary
temporal operator O is a formula of the form
r; ~
z; EÞ
OMR ½Tð~
r; z~; E1 Þ ^ OMR ½Tð~
r; ~
z; E2 Þ ) E1 ¼ E2 , i.e., for each possible output
such that OMR ½Tð~
tuple there is only one minimal requirement.
This is a general scheme Y the minimal requirements must be defined for each
operator individually. Hence, if we instantiate the above definition for concrete operators, such as the standard relational operators, we have OMR 2 fMR ½P; MR ½X; [MR ;
MR ; MR ; GMR ½G½Fg, where P is a selection predicate, X is a subset of attributes on
which to project, G is a subset of attributes on which to group, and F ¼ f f1 ðA1 Þ=
A01 ; . . . ; fp ðAp Þ=A0p g is a list of aggregate functions fi on attributes Ai along with new
360
Böhlen et al. / An algebraic framework for temporal attribute characteristics
attribute names A0i for the resulting values. For instance, the minimal requirements for
temporal projection would be defined as follows:
z; EÞ iff 8tðt 2 E , 9xðx 2 r ^ t 2 x:T ^ ~z ¼ x½XÞÞ
MR ½X½Tðhri; ~
where ~
z is the unadjusted output tuple and E is the set of all time points for which ~z
holds true.
Recall that the purpose is to specify the time points that must be associated with
result tuples. The minimal requirements of a temporal operator specify for each
prospective, unadjusted result tuple the set of time points E that the timestamp(s) of
that tuple must include.
Below we exemplify the minimal requirements by defining them for temporal
aggregation. The aggregation example is used throughout the paper. We consider the
calculation of the sums of hours, SH, spent on projects as a running example. The
complete definition of the minimal requirements for CETA is given in section 8.
Example 6.1 (Temporal Aggregation, MR). The minimal requirements for the
temporal aggregation are defined as follows:
GMR ½G½F½Tðhri; z~; EÞ iff
8tðt 2 E , 9xðx 2 r ^ t 2 x:T ^
av1 ¼ ffy½A1 j y 2 r ^ t 2 y:T ^ x½G ¼ y½Ggg ^ . . . ^
avp ¼ ffy½Ap j y 2 r ^ t 2 y:T ^ x½G ¼ y½Ggg ^
z ¼ x½G [ fA01 =f1 ðav1 Þ; . . . ; A0p =fp ðavp ÞgÞÞ
~
where G is a subset of the attributes on which to group the result, and F ¼
f f1 ðA1 Þ=A01 ; . . . ; fp ðAp Þ=A0p g as described above.
Applying this definition to the table EMPL we get the following minimal requirements (shown graphically for project P1 in figure 2):
GMR ½P½sumðHÞ=SH½TðhEMPLi;fP=P1; SH=sumðffH=300; H=400; H=300ggÞg;f1; 2; 3gÞ
GMR ½P½sumðHÞ=SH½TðhEMPLi;fP=P1; SH=sumðffH=400; H=300ggÞg;f4; 5; 6gÞ
GMR ½P½sumðHÞ=SH½TðhEMPLi;fP=P2; SH=sumðffH=200; H=600ggÞg;f2; 3gÞ
GMR ½P½sumðHÞ=SH½TðhEMPLi;fP=P2; SH=sumðffH=600ggÞg;f4; 5gÞ
The second minimal requirement actually covers what will turn out to be two output
tuples. This is so because the two output tuples have identical unadjusted nontimestamp attribute values. In particular, the non-timestamp attribute values of these
preliminary output tuples are taken directly from the input tuples, without any
modifications. When we adjust the attribute values based on the relevant argument
tuples and the attribute characteristics, we shall see that two output tuples result.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
361
{N/Jan,...,P/P1,H/300,T/[1,3]}
{N/Tom,...,P/P1,H/400,T/[1,4]}
{N/Tom,...,P/P1,H/400,T/[5,6]}
Input tuples from EMPL
{N/Ann,...,P/P1,H/300,T/[1,6]}
1
2
3
Output time points E
{P/P1,SH/sum({{H/300,H/400,H/300}})}
4
5
6
and output tuples
~
z
{P/P1,SH/sum({{H/400,H/300}})}
1
2
3
4
5
6
Figure 2. Graphical representation of the minimal requirements.
The minimal requirements specify the output tuples modulo the grouping of time
points into interval timestamps and the transformation of malleable attribute values
according to the timestamps that will be associated with the output tuples.
6.2. Required argument tuples
Having associated output tuples and time points, the next step is to trace output
tuples back to input tuples. In general, an output tuple is completely determined by a
small subset of tuples from the input relations, e.g., a result tuple of a binary join is
completely determined by two input tuples, one from each of the two argument
relations. This provides the information how to group time points into timestamps.
Definition 6.2 (Required Argument Tuples, RAT). Let ~
r ¼ hr1 ; . . . ; rl i be argument
relations with timestamp attribute T, let O be an l-ary temporal operator, let I be a
time interval, and let ~
z be a set of unadjusted attribute/value pairs (of a result tuple).
The required argument tuples, ~
s, for ~
z and I are a minimal subset of the argument
relations, ~
s ~
r, that completely determine ~
z:
r; ~
z; I;~
sÞ iff
ORAT ½Tð~
r; ~z; EÞ ^ I EÞ ^
9EðOMR ½Tð~
s; ~
z; E0 Þ ^ I E0 ^~
s ~
rÞ ^
9E0 ðOMR ½Tð~
8~
s 00 ; E00 ð~
s 00 ~
s ^ OMR ½Tð~
s 00 ; ~z; E00 Þ ) I 6 E00 Þ
This definition splits the timestamp E into all possible (convex) sets, i.e., time
intervals I E, and it isolates for each such time interval and corresponding output
362
Böhlen et al. / An algebraic framework for temporal attribute characteristics
tuple ~
z the minimal set of input tuples, ~
s ~
r, that completely determine this specific
output. In other words, the result tuple ~
z [ fT=Ig can be determined entirely from the
required argument tuples ~
s, but not from any proper subset of them.
Example 6.2 (Temporal Aggregation, RAT). We continue our example by determining
the required argument tuples for the aggregation query. We consider each MR and
generate for each timestamp included in the temporal extent of the MR a corresponding
required argument tuple. From the minimal requirement with the associated set of times
f4; 5; 6g, we generate six required argument tuples, including the following ones for
project P1:
GRAT ½P½sumðHÞ=SH½TðhEMPLi;
fP=P1; SH=sumðffH=400; H=300ggÞg;
½4; 6;
hffN=Tom; . . . ; P=P1; H=400; T=½1; 4g;
fN=Tom; . . . ; P=P1; H=400; T=½5; 6g;
fN=Ann; . . . ; P=P1; H=300; T=½1; 6ggiÞ
GRAT ½P½sumðHÞ=SH½TðhEMPLi;
fP=P1; SH=sumðffH=400; H=300ggÞg;
½4; 4;
hffN=Tom; . . . ; P=P1; H=400; T=½1; 4g;
fN=Ann; . . . ; P=P1; H=300; T=½1; 6ggiÞ
GRAT ½P½sumðHÞ=SH½TðhEMPLi;
fP=P1; SH=sumðffH=400; H=300ggÞg;
½5; 6;
hffN=Tom; . . . ; P=P1; H=400; T=½5; 6g;
fN=Ann; . . . ; P=P1; H=300; T=½1; 6ggiÞ
Figure 3 shows a graphical representation of these RATs. The first RAT states
that in order to get output tuple fP=P1; SH=sumðffH=400; H=300ggÞg over time
interval ½4; 6, precisely the three input tuples specified above are required. For the
same MR, the RATs for the time intervals ½5; 5 and ½6; 6 are the same as for ½5; 6,
and the RAT for ½4; 5 is the same as for ½4; 6.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
363
{N/Ann,...,P/P1,H/300,T/[1,6]}
{N/Tom,...,P/P1,H/400,T/[5,6]}
Input tuples
{N/Tom,...,P/P1,H/400,[1,4]}
{P/P1,SH/sum({{H/400,H/300}}),T/[4,4]}
{P/P1,SH/sum({{H/400,H/300}}),T/[5,6]}
Output tuples
{P/P1,SH/sum({{H/400,H/300}}),T/[4,6]}
3
4
5
6
Figure 3. Graphical representation of selected RATs.
6.3. Maximal output tuples
The final step is to identify those required argument tuples that are maximal in
terms of temporal extent and at the same time respect the grouping of time points in
the timestamps of the input tuples. Along with the grouping of time points into
timestamps, we adjust the attribute values of the output tuples to fit their timestamps.
We use the following abbreviations for vectors to keep the definition of the
maximal output tuples syntactically simple:
I 00
¼ hI100 ; . . . ; Iq00 i
I00 I
¼ I100 I ^ . . . ^ Iq00 I
[I 00
¼ I100 [ . . . [ Iq00
~
s00
¼ h~
s 100 ; . . . ;~
s q00 i
~
s00 ~
s
¼~
s100 ~
s ^ . . . ^~
sq00 ~
s
s00 Þ ¼ ORAT ½Tð~
r; ~
z; I100 ;~
s100 Þ ^ . . . ^ ORAT ½Tð~
r; ~z; Iq00 ;~
sq00 Þ
ORAT ½Tð~
r; ~
z; I00 ;~
Definition 6.3 (Maximal Output Tuples). Let ~
r ¼ hr1 ; . . . ; rl i be argument relations
s ~
r, let O be an l-ary temporal
with timestamp attribute T, let ~
s ¼ hs1 ; . . . ; sl i with ~
operator, let z be a set of attribute/value pairs, let I be a time interval, and let C be
attribute characteristics. Then z [ fT=Ig is a maximal output tuple if I cannot be
364
Böhlen et al. / An algebraic framework for temporal attribute characteristics
enlarged without changing the required argument tuples and if no proper subsets of ~
s
qualify as required argument tuples for ~
z over sub-intervals of I:
r; z; I;~
sÞ iff
OMOT ½T½Cð~
9~
zðORAT ½Tð~
r; ~
z; I;~
sÞ ^ z ¼ adjð~
z; I;~
s; CÞ^
ð1Þ
:9I0 ðI0 I ^ ORAT ½Tð~
r; ~
z; I0 ;~
sÞÞ^
ð2Þ
:9I00 ;~
s00 ðI00 I ^ I ¼ [I00 ^~
s00 ~
s00 ÞÞÞ
s ^ ORAT ½Tð~
r; ~z; I 00 ;~
ð3Þ
Condition 1 selects a specific required argument tuple and adjusts the attribute
values in ~
z according to time interval I. Condition 2 ensures that I is a maximal time
interval for the specific RAT, ~
s. Condition 3 avoids coalescing of snapshot-equivalent
tuples. In other words, it respects the grouping of time points in the input tuples by
propagating this grouping to the output tuples.
Example 6.3 (Temporal aggregation, MOT). For the calculation of the MOTs we
need the characteristics of the attributes. Assume the malleable characteristic for
attribute Hours and the constant characteristic for all other attributes. We then get the
following MOTs:
GMOT ½P½sumðHÞ=SH½T½P:c; H:mðhEMPLi; fP=P1; SH=750g; ½1; 3; h. . .iÞ
GMOT ½P½sumðHÞ=SH½T½P:c; H:mðhEMPLi; fP=P1; SH=150g; ½4; 4; h. . .iÞ
GMOT ½P½sumðHÞ=SH½T½P:c; H:mðhEMPLi; fP=P1; SH=300g; ½5; 6; h. . .iÞ
GMOT ½P½sumðHÞ=SH½T½P:c; H:mðhEMPLi; fP=P2; SH=500g; ½2; 3; h. . .iÞ
GMOT ½P½sumðHÞ=SH½T½P:c; H:mðhEMPLi; fP=P2; SH=300g; ½4; 5; h. . .iÞ
The RAT for project P1 over time interval ½4; 6 (resulting from the first MR) is
not a MOT because the third condition of the definition is violated: There exist two
RATs for the same output values for sub-intervals of ½4; 6, namely one for ½4; 4 and
one for ½5; 6. And these two RATs qualify as MOTs. Similarly, we can show that the
RAT that yields fP=P1; SH=750g is the only MOT to cover the interval ½1; 3. No
other required argument tuple over any sub-interval of ½1; 3 qualifies as a MOT due to
the second condition of the above definition.
Definition 6.4 (Characteristics-Enabled Temporal Operator). Let ~
r ¼ hr1 ; . . . ; rl i be
argument relations with timestamp attribute T, let x and z be sets of attribute/value
pairs, and let C be attribute characteristics. Operator O is a characteristics-enabled
Böhlen et al. / An algebraic framework for temporal attribute characteristics
365
temporal operator iff the result tuples coincide with the maximal output tuples, i.e., iff
for all temporal argument relations, ~
r, the following holds:
r; z; I;~
sÞ ^ x ¼ z [ fT=IgÞ
x 2 O½T½Cð~
rÞ , 9z; I;~
sðOMOT ½T½Cð~
Example 6.4 (Characteristics-Enabled Temporal Operator). The result tuples of the
temporal aggregation query are the following:
G½P½sumðHÞ=SH½T½P:c; H:mðhEMPLiÞ ¼ ffP=P1; SH=750; T=½1; 3g;
fP=P1; SH=150; T=½4; 4g;
fP=P1; SH=300; T=½5; 6g;
fP=P2; SH=500; T=½2; 3g;
fP=P2; SH=300; T=½4; 5gg
7.
Properties of the algebraic framework
In this section we prove the key properties of our algebraic framework, which
carry over to concrete algebraic operators defined in terms of their minimal
requirements. Our algebraic framework is an extension of point-based temporal
algebras. It extends the point-based semantics by propagating the grouping of time
points from the input tuples to the output tuples. In the following two theorems we
show that this grouping of the time points from the minimal requirements into time
intervals neither introduces nor drops time points from the output tuples.
Theorem 7.1 (Soundness). Let O be an l-ary temporal operator in our framework,
~
r ¼ hr1 ; . . . ; rl i be the argument relations with timestamp attribute T, z and ~z be output
tuples, E be a set of time points, C be attribute characteristics, and I be a time interval.
The maximal output tuples are sound with respect to the minimal requirements, i.e.,
the maximal output tuples include only time points that are included in the minimum
requirements:
r; z; I;~
sÞ ^ t 2 I ) 9E; ~zðOMR ½Tð~
r; ~z; EÞ ^ t 2 EÞÞ
8t;~
r;~
s; C; z; IðOMOT ½T½Cð~
Proof. The proof traces the generation of maximal output tuples back to the minimal
r; z; I;~
sÞ, is
requirements. By definition of maximal output tuples, a MOT, OMOT ½T½Cð~
r; ~z; I;~
sÞ. By definition
derived from a RAT with the same time interval I, i.e., ORAT ½Tð~
of required argument tuples, RATs group time points from the minimal requirements
into time intervals, which does not introduce new time points. The two steps together
Ì
prove the theorem.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
366
Theorem 7.2 (Completeness). Let O be an l-ary temporal operator in our framework,
~
r ¼ hr1 ; . . . ; rl i be the argument relations with timestamp attribute T, z and ~z be output
tuples, E be a set of time points, C be attribute characteristics, and I be a time interval.
The maximal output tuples are complete with respect to the minimal requirements, i.e.,
the maximal output tuples cover all time points from the minimal requirements:
r; ~
z; EÞ ^ t 2 E ) 9I; z;~
sðOMOT ½T½Cð~
r; z; I;~
sÞ ^ t 2 IÞÞ
8t;~
r; C; ~
z; EðOMR ½Tð~
Proof. The definition of required argument tuples splits the temporal extension E of
MRs into all possible sub-intervals (groupings of convex sets of time points) and
generates a corresponding RAT. For each MR we therefore get one or more RATs,
r; ~
z; I1 ;~
s1 Þ; . . . ; ORAT ½Tð~
r; ~
z; Ip ;~
sp Þ. This is graphically shown in figure 4.
ORAT ½Tð~
Since I1 ; . . . ; I p are all possible sub-intervals of E, all time points of E are covered by
the RATs.
To show that the RATs that qualify as MOTs cover all time points we start with
the RATs over the maximal time intervals. If each of these RATs is also a MOT, all
time points are covered, and we are done. Otherwise, at least one of these RATs, say
r; ~
z; Ij ;~
sj Þ, does not qualify as MOT, and we will show that the time points of
ORAT ½Tð~
Ij are covered by other MOTs. Since in this case condition 3 of the MOT definition is
sjq Þ such that
r; ~z; Ijq ;~
sj1 Þ; . . . ; ORAT ½Tð~
r; ~
z; Ij1 ;~
violated, there exist RATs, ORAT ½Tð~
sj and all Iji together cover all time points
sj ; . . . ; ~
sjq ~
sj1 ~
Ij1 Ij ; . . . ; Ijq Ij and ~
of Ij , i.e., Ij ¼ Ij1 [ . . . [ Ijq . If each of these RATs qualifies as MOT, all time points,
t 2 Ij , are covered, and we are done. Otherwise, either condition 3 or condition 2 of
the MOT definition is violated. In the former case, we can again split at least one of
the RATs into several ones, and we proceed as above. In the latter case, at least one
of the RATs can with the same input tuples, ~
sji , be extended to a larger time interval,
r; ~
z; Ij0i ;~
sji Þ with Ij0i Iji . We show that all time
Ij0i , i.e., there exists a RAT ORAT ½Tð~
points in Ij0i are covered, and consequently all time points in Iji are covered. In this
process, the cardinality of the ~
sj is strictly decreasing, while the union of the intervals
Time points of MR
Maximal time intervals in RATs
..
.
Minimal time intervals in RATs
1
2
3
4
5
6
7
8
9
10
Figure 4. Grouping of time points from the minimal requirements.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
367
Iji covers all time points. Eventually, this process terminates, when no further subsets
of the input tuples ~
sj can be generated and the intervals Iji cannot be further extended.
Then the corresponding RATs qualify as MOTs.
Ì
These two theorems together show that all time points and only those time points
from the minimal requirements are covered by the time intervals of the maximal
output tuples. The next theorem shows that the time intervals of the maximal output
tuples are unique and maximal.
Theorem 7.3 (Uniqueness of Maximal Output Tuples). Let O be an l-ary temporal
operator in our framework, ~
rmin ¼ hr1 ; . . . ; rl i be a minimal vector of input relations
with timestamp attribute T to produce z over I (that is if one tuple is removed from
~
rmin then the result is empty), z be an output tuple, I and I 0 be time intervals, C be
attribute characteristics, and IntðDT Þ be the set of all intervals over the time domain
DT . The timestamp of the maximal output tuples is unique: It groups as many time
points as possible, while preserving the grouping of the time points in the input tuples.
8~
rmin ; z; I; CðOMOT ½T½Cð~
rmin ; z; I;~
rmin Þ)
rmin ; z; I0 ;~
rmin Þ ^ I 6¼ I0 ^ I [ I0 2 IntðDT ÞÞÞ
:9I0 ðOMOT ½T½Cð~
Proof. We have to show that, given a specific output tuple z and a set of input tuples,
~
rmin , for z, the time intervals I in the maximal output tuples are maximal.
Since ~
rmin is minimal, there exists a RAT of the form ORAT ½Tð~
rmin ; ~z; I;~
rmin Þ for
the maximal output tuple on the left-hand side. From the definition of RAT we get also
that for all strict subsets ~
rmin the minimal requirements do not cover I.
Now we do a proof by contradiction. We assume that there exists a MOT of the
rmin ; z; I 0 ;~
rmin Þ, where I 6¼ I0 and I [ I0 2 IntðDT Þ. Then by the
form OMOT ½T½Cð~
rmin ; ~z; I 0 ;~
rmin Þ. Next we
definition of MOTs there exists a RAT of the form ORAT ½Tð~
do a case analysis for the interval I0 . If I 0 I or I 0 I, condition 2 of the MOT
definition leads to a contradiction. Otherwise, I and I 0 are overlapping or adjacent
rmin ; ~z; I00 ;~
s 00 Þ, where I 00 ¼ I [ I0
intervals, and there exists a RAT of the form ORAT ½Tð~
00
min
00
min
and ~
s ~
r . If ~
s ¼~
r , condition 2 of the MOT definition leads to a contradiction.
00
min
r , and condition 3 of the RAT definition leads to a contradiction,
Otherwise, ~
s ~
i.e., since I is not covered by any of the subsets of ~
rmin , I 00 cannot be covered as well.
8.
A characteristics-enabled temporal algebra
In this section we instantiate the framework described in section 6 and define
CETA, a characteristics-enabled temporal algebra. To define a new algebra all we
have to do is to specify the minimal requirements of each operator in the algebra. In
the following definition of CETA operators we adapt the traditional relational algebra
operators selection, projection, union, difference, Cartesian product, and aggregation.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
368
Definition 8.1 (Minimal Requirements CETA Operators). Let r; s be temporal input
relations with schemas fA1 ; . . . ; Ak ; Tg and fB1 ; . . . ; Bl ; Tg, respectively, and
timestamp attribute T, let z be a set of attribute/value pairs of an output tuple
modulo the adjustment of the attribute values, E be a set of (not necessarily connected)
time points, P be a predicate for the selection operator, A and B be abbreviations for
the attributes fA1 ; . . . ; Ak g and fB1 ; . . . ; Bl g, respectively, X be a subset of the
attributes for the projection, G be a subset of the attributes on which to group the
result, and F ¼ ff1 ðA1 Þ=A01 ; . . . ; fp ðAp Þ=A0p g be aggregate functions fi applied to
attributes Ai , with the result values stored as attributes A0i . The minimal requirements
for the standard, temporal relational algebra operators selection, projection, union,
difference, Cartesian product, and aggregation are defined as follows:
z; EÞ iff
MR ½P½Tðhri; ~
8tðt 2 E , 9xðx 2 r ^ t 2 x:T ^ PðxÞ ^ ~
z ¼ x½AÞÞ
z; EÞ iff
MR ½X½Tðhri; ~
8tðt 2 E , 9xðx 2 r ^ t 2 x:T ^ ~
z ¼ x½XÞÞ
z; EÞ iff
[MR ½Tðhr; si; ~
8tðt 2 E , 9xððx 2 r _ x 2 sÞ ^ t 2 x:T ^ ~
z ¼ x½AÞÞ
z; EÞ iff
MR ½Tðhr; si; ~
8tðt 2 E , 9xðx 2 r ^ t 2 x:T ^ ~
z ¼ x½A ^
8yðy 2 s ^ y½B ¼ x½A ) t 62 y:TÞÞÞ
z; EÞ iff
MR ½Tðhr; si; ~
8tðt 2 E , 9x; yðx 2 r ^ y 2 s ^ t 2 x:T ^ t 2 y:T ^ ~z ¼ x½A [ y½BÞÞ
G
MR
½G½F½Tðhri; ~z; EÞ iff
8tðt 2 E , 9xðx 2 r ^ t 2 x:T ^
av1 ¼ ffy½A1 j y 2 r ^ t 2 y:T ^ x½G ¼ y½Ggg ^ . . . ^
avp ¼ ffy½Ap j y 2 r ^ t 2 y:T ^ x½G ¼ y½Ggg ^
z ¼ x½G [ fA01 =f1 ðav1 Þ; . . . ; A0p =fp ðavp ÞgÞÞ
~
For the selection, projection, and union operators, the output tuples inherit the
timestamps of the input tuples unchanged, no attribute values are modified, and only
the grouping of time points has to be considered. For the difference, Cartesian product,
and aggregation operators, the timestamps of the output tuples are usually different
Böhlen et al. / An algebraic framework for temporal attribute characteristics
369
{SN/Jan,S/20,T/[1,5]}
SAL
SN
Jan
Jan
Tom
S
20
20
24
{SN/Jan,S/20,T/[6,7]}
T
[1,5]
[6,7]
[1,6]
{SN/Tom,S/24,T/[1,6]}
1
B ON
BN
Jan
Tom
2
3
4
5
6
7
{BN/Jan,B/12,T/[4,7]}
B
12
4
T
[4,7]
[2,3]
{BN/Tom,B/4,T/[2,3]}
1
2
3
4
5
6
7
Figure 5. The relations EMPL and SAL of the project database.
from those of the input tuples. Hence, in addition to the grouping of time points, the
values of malleable attributes have to be adjusted accordingly.
For the following example we introduce two additional relations in the project
database. The relation SAL stores name ðSNÞ and salary ðSÞ of employees, and the
relation Bon stores name ðBNÞ and bonus ðBÞ of employees. The Salary and Bonus
attributes have a malleable characteristics and are stored in KEuro. The other
attributes are constant. Figure 5 shows these two relations.
Example 8.1 (Temporal Cartesian product). Consider the retrieval of all employees
who in a certain period got a bonus that is equal to or more than half of their salary. If
we are looking at the attribute values as they are stored in the database and consider
them as constant values, we might believe that this is the case only for Jan during the
time interval ½4; 7. Since Bonus and Salary attributes are malleable, however, we have
to adjust these attribute values before applying the selection condition. Then we get a
different result: Jan’s bonus is greater than half of his salary in the time interval ½4; 5
{SN/Jan,S/20,T/[1,5]}
{SN/Jan,S/20,T/[6,7]}
Input tuples from SAL
{BN/Jan,B/12,T/[4,7]}
Input tuple from BON
4
5
6
7
Output time points E
and output tuple z
{SN/Jan,S/20,BN/Jan,B/12}
1
2
3
4
5
6
7
Figure 6. Graphical representation of selected MRs.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
370
(his salary is 8, and his bonus is 6), and Tom’s bonus is half of his salary in the time
interval ½2; 3 (his salary is 8, and his bonus is 4).
Formally, this query can be answered by computing first the temporal Cartesian
product of the two relations and then applying the selection operator:
½P½T½Cðh½T½CðhSAL; BONiÞiÞ
The selection condition is P ¼ ðSN ¼ BN ^ B S=2Þ, and the attribute characteristics
is C ¼ fSN:c; S:m; BN:c; CI:mg.
The minimal requirements for the temporal Cartesian product of SAL and BON
are then given as follows:
MR ½TðhSAL; BONi; fSN=Jan; S=20; BN=Jan; B=12g; f4; 5; 6; 7gÞ
MR ½TðhSAL; BONi; fSN=Jan; S=20; BN=Tom; B=4g; f2; 3gÞ
MR ½TðhSAL; BONi; fSN=Tom; S=24; BN=Jan; B=12g; f4; 5; 6gÞ
MR ½TðhSAL; BONi; fSN=Tom; S=24; BN=Tom; B=4g; f2; 3gÞ
Note that the first minimal requirement combines two pairs of input tuples and
coalesces the corresponding output tuples, since they are equivalent on the nontimestamp attributes values. This situation is graphically shown in figure 6.
The next step is to calculate the required argument tuples. For the first minimal
requirement above we get a total of 11 RATs, one for each sub-interval of the
{SN/Jan,S/20,T/[1,5]}
{SN/Jan,S/20,T/[6,7])
Input tuples from SAL
{BN/Jan,B/12,B/[4,7]}
Input tuple from BON
{SN/Jan,S/20,BN/Jan,B/12,T/[4,5]}
{SN/Jan,S/20,BN/Jan,B/12,T/[6,7]}
Output tuples
{SN/Jan,S/20,BN/Jan,B/12,T/[4,7]}
1
2
3
4
5
6
7
Figure 7. Graphical representation of selected RATs.
Böhlen et al. / An algebraic framework for temporal attribute characteristics
371
temporal extent f4; 5; 6; 7g. Three of these RATs are graphically illustrated in figure 7
and given below:
RAT ½TðhSAL; BONi;
fSN=Jan; S=20; BN=Jan; B=12g;
½4; 7;
hffSN=Jan; S=20; T=½1; 5g; fSN=Jan; S=20; T=½6; 7g; fBN=Jan; B=12; T=½4; 7ggiÞ
RAT ½TðhSAL; BONi;
fSN=Jan; S=20; BN=Jan; B=12g;
½4; 5;
hffSN=Jan; S=20; T=½1; 5g; fBN=Jan; B=12; T=½4; 7ggiÞ
RAT ½TðhSAL; BONi;
fSN=Jan; S=20; BN=Jan; B=12g;
½6; 7;
hffSN=Jan; S=20; T=½6; 7g; fBN=Jan; B=12; T=½4; 7ggiÞ
The last step is to select those RATs that group as many time points as possible
and at the same time respect the grouping of time points in the argument tuples. With
Salary and Bonus attributes being malleable and the other attributes being constant, we
get the following MOTs for the temporal Cartesian product:
MOT ½T½SN:c;S:m;BN:c;B:mðhSAL;BONi;fSN=Jan; S=8; BN=Jan;B=6g;½4; 5;h. . .iÞ
MOT ½T½SN:c;S:m;BN:c;B:mðhSAL;BONi;fSN=Jan; S=20; BN=Jan;B=6g;½6; 7;h. . .iÞ
MOT ½T½SN:c;S:m; BN:c;B:mðhSAL;BONi;fSN=Jan;S=8; BN=Tom;B=4g;½2; 3;h. . .iÞ
MOT ½T½SN:c;S:m;BN:c;B:mðhSAL;BONi;fSN=Tom;S=12; BN=Jan;B=9g;½4; 6;h. . .iÞ
MOT ½T½SN:c;S:m;BN:c;B:mðhSAL;BONi;fSN=Tom;S=8; BN=Tom;B=4g;½2; 3;h. . .iÞ
The RAT with the output tuple fSN=Jan; S=20; BN=Jan; B=6g and time interval ½4; 7
is not a MOT, since condition 3 in the definition is violated, i.e., there exist RATs for
the same output tuple over sub-intervals of ½4; 7 and they are produced by a subset of
the input tuples. In fact, we have one RAT over the time interval ½4; 5 and one over
the time interval ½6; 7. Both of them qualify as MOT and completely cover the
timestamp of the first minimal requirement.
372
Böhlen et al. / An algebraic framework for temporal attribute characteristics
The result of the temporal Cartesian product is then given as follows:
½T½SN:c; S:m; BN:c; B:mðhSAL; BONiÞ ¼ ffSN=Jan; S=8; BN=Jan; B=6; T=½4; 5g;
fSN=Jan; S=20; BN=Jan; B=6; T=½6; 7g;
fSN=Jan; S=8; BN=Tom; B=4; T=½2; 3g;
fSN=Tom; S=12; BN=Jan; B=9; T=½4; 6g;
fSN=Tom; S=8; BN=Tom; B=4; T=½2; 3gg
We denote this intermediate result relation as SALBON. It is the input for the
following temporal selection, which retrieves all tuples with corresponding names and
where the bonus is equal or greater than half of the salary, i.e., SN ¼ BN ^ B S=2.
The minimal requirements for the temporal selection are given as
MR ½TðhSALBONi; fxSN=Jan; S=8; BN=Jan; B=6g; f4; 5gÞ
MR ½TðhSALBONi; fSN=Tom; S=8; BN=Tom; B=4g; f2; 3gÞ
We calculate the required argument tuples and the maximal output tuples in turn
and get as a final result the expected output tuples
½SN ¼ BN ^ B S=2½T½SN:c; S:m; BN:c; B:mðhSALBONiÞ ¼
ffSN=Jan; S=8; BN=Jan; B=6; T=½4; 5g;
fSN=Tom; S=8; BN=Jan; B=4; T=½2; 3gg
9.
Summary
This paper offers a generalized algebraic framework that supports a variety of
temporal semantics. Specifically, we show that it is possible to extend a strictly pointbased semantics to support malleable and atomic attributes. This can be achieved
without jeopardizing the degree of temporal support and without adding complexity to
the basic definitions of algebraic operators. The framework may be carried over to
SQL. For instance, the query in Example 8.1 could be expressed by the following
SQL-query: SELECT * FROM SAL½SN : c; S : m; BON½BN : c; CI : mÞWHERE SN ¼
BN AND CI S=2.
In order to support malleable and atomic attribute characteristics, the grouping of
time points into timestamps must be controlled precisely. A natural choice is to
preserve the original groupings of time points. We use the lineage of result tuples to
group time points according to their original grouping.
In future work, we aim to investigate efficient implementation in detail. Based on
the MD join [2, 3], a general-purpose algebraic aggregate/join operator for multi-
Böhlen et al. / An algebraic framework for temporal attribute characteristics
373
dimensional data, we are currently developing an algorithm for the temporal
aggregation defined in this paper.
Another promising direction for future research is to incorporate the algebraic
framework proposed in this paper into one of the several temporal extensions to SQL
that have been proposed [9]. The resulting, more flexible support for time-referenced
data may constitute an important step towards built-in support for temporal database
management in database management systems.
Acknowledgements
We thank the anonymous reviewers whose careful work and suggestions significantly improved the presentation of this paper. The work was supported by the
Municipality of Bozen-Bolzano through the eBZ-2015 initiative.
References
[1] T. Abraham and J.F. Roddick, Survey of spatio-temporal databases, Geoinformatica 3(1) (March
1999) 61Y99.
[2] M.O. Akinde and M.H. Böhlen, The efficient computation of subqueries in complex OLAP queries,
in: Proceedings of the 19th International Conference on Data Engineering, Bangalore, India (March
2003) pp. 163Y174.
[3] M.O. Akinde, M.H. Böhlen, T. Johnson, L.V.S. Lakshmanan and D. Srivastava, Efficient OLAP
query processing in distributed data warehouses, in: Proceedings of the Eighth Conference on
Extending Database Technology, Prague, Czech Republic, (March 2002) pp. 336Y353.
[4] J.F. Allen, Maintaining knowledge about temporal intervals, Communications of the ACM 16(11)
(November 1983) 832Y843.
[5] J.F. Allen, Towards a general theory of action and time, Artificial Intelligence 23(2) (July 1984) 123Y154.
[6] I. Androutsopoulos, Exploring Time, Tense and Aspect in Natural Language Database Interfaces
(Benjamins, 2002).
[7] J. BenYZvi, The Time Relational Model, PhD thesis, Computer Science Department, UCLA, 1982.
[8] M. Böhlen, J. Chomicki, R. Snodgrass and D. Toman, Querying TSQL2 databases with temporal
logic, in: Proceedings of the Fifth International Conference on Extending Database Technology,
Avignon, France (March 1996) pp. 325Y341.
[9] M.H. Böhlen and C.S. Jensen, Temporal data model and query language concepts, in: Encyclopedia
of Information Systems, Vol. 4 (Academic, 2003) pp. 437Y453.
[10] M.H. Böhlen, C.S. Jensen and R.T. Snodgrass, Temporal statement modifiers, ACM Transactions on
Database Systems 25(4) (December 2000) 407Y456.
[11] M.H. Böhlen, R.T. Snodgrass and M.D. Soo, Coalescing in temporal databases, in: Proceedings of
the 22nd International Conference on Very Large Data Bases, Mumbai (Bombay), India (February
1996) pp. 180Y191.
[12] I.T. Bowman and D. Toman, Optimizing temporal queries: Efficient handling of duplicates, Data
and Knowledge Engineering 44(2) (February 2003) 143Y164.
[13] M.H. Böhlen, R. Busatto and C.S. Jensen, Point- versus interval-based temporal data models, in:
Proceedings of the 14th International Conference on Data Engineering, Orlando, Florida (February
1998) pp. 192Y200.
374
Böhlen et al. / An algebraic framework for temporal attribute characteristics
[14] E. Camossi, E. Bertino, M. Mesiti and G. Guerrini, Handling expiration of multigranular temporal
objects, Journal of Logic and Computation 14(1) (February 2004) 23Y50.
[15] J. Chomicki, D. Toman and M.H. Böhlen, Querying ATSQL databases with temporal logic, ACM
Transactions on Database Systems 26(2) (June 2001) 145Y178.
[16] Y. Cui, J. Widom and J.L. Wiener, Tracing the lineage of view data in a warehousing environment,
ACM Transactions on Database Systems 25(2) (June 2000) 179Y227.
[17] Y. Cui and J. Widom, Lineage tracing for general data warehouse transformations, in: Proceedings
of the 27th International Conference on Very Large Databases, Rome, Italy (September 2001) pp.
471Y480.
[18] O. Etzion, S. Jajodia and S. Sripada, eds., Temporal Databases: Research and Practice, Lecture
Notes in Computer Science, Vol. 1399 (Springer, 1998).
[19] D. Gabbay and P. McBrien, Temporal logic & historical databases, in: Proceedings of the 17th
International Conference on Very Large Databases, Barcelona, Catalonia, Spain ( September 1991)
pp. 423Y430.
[20] S.K. Gadia, Weak temporal relations, in: Proceedings of the Fifth ACM Symposium on Principles of
Database Systems, Cambridge, Massachusetts, USA (March 1986) pp. 70Y77.
[21] S.K. Gadia, A homogeneous relational model and query languages for temporal databases, ACM
Transactions on Database Systems 13(4) (December 1988) 418Y448.
[22] C.S. Jensen and C.E. Dyreson, A consensus glossary of temporal database concepts Y February 1998
Version, in: [O. Etzion et al., 1998] (1998) pp. 367Y405.
[23] C.S. Jensen, M.D. Soo and R.T. Snodgrass, Unifying temporal models via a conceptual model,
Information Systems 19(7) (1994) 513Y547.
[24] R. Kowalski and M. Sergot, A logic-based calculus of events, New Generation Computing 4(1)
(1986) 67Y95.
[25] J. McCarthy and P.J. Hayes, Some philosophical problems from the standpoint of artificial
intelligence, Machine Intelligence 4 (1969) 463Y502.
[26] L.E. McKenzie and R.T. Snodgrass, Evaluation of relational algebras incorporating the time
dimension in databases, ACM Computing Surveys 23(4) (December 1991) 501Y543.
[27] R. Nelken, Questions, Time, and Natural Language Interfaces to Temporal Databases, PhD thesis,
The Technion Y Israel Institute of Technology, 2001.
[28] R.T. Snodgrass, Developing Time-Oriented Database Applications in SQL, (Morgan Kaufmann, San
Francisco, California, 2000).
[29] A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev and R. Snodgrass, eds., Temporal Databases:
Theory, Design, and Implementation (Benjamin/Cummings, Redwood City, California, 1993).
[30] A.U. Tansel, Temporal relational data model, IEEE Transactions on Knowledge and Data
Engineering 9(3) (May/June 1997) 464Y479.
[31] P. Terenziani and R.T. Snodgrass, Reconciling point-based and interval-based semantics in temporal
relational databases: A treatment of the telic/atelic distinction, IEEE Transactions on Knowledge
and Data Engineering 16(5) (May 2004) 540Y551.
[32] D. Toman, Point-based vs interval-based temporal query languages, in: Proceedings of the 15th
ACM Symposium on Principles of Database Systems, Montreal, Canada (June 1996) pp. 58Y67.
[33] A. Tuzhilin and J. Clifford, A temporal relational algebra as a basis for temporal relational
completeness, in: Proceedings of the 16th International Conference on Very Large Databases,
Brisbane, QLD, Australia (August 1990) pp. 13Y23.
[34] Y. Wu, S. Jajodia and X.S. Wang, Temporal database bibliography update, in: [O. Etzion et al.,
1998] (1998) pp. 338Y366.