A Novel Approach to Model NOW in Temporal Databases
Bela Stantic, John Thornton, Abdul Sattar
School of Information Technology
Griffith University Gold Coast, Australia
b.stantic, j.thornton, a.sattar ✁ @griffith.edu.au
Abstract
In bitemporal databases, current facts and transaction
states are modelled using a special value to represent the
current time (such as a minimum or maximum timestamp or
NULL). Previous studies indicate that the choice of value
for now (i.e. the current time) significantly influences the
efficiency of accessing bitemporal data. This paper introduces a new approach to represent now, in which current
tuples and facts are represented as points on the transaction time and valid time line respectively. This allows us to
exploit the computational advantages of point-based query
languages. Via an empirical study, we demonstrate that
our new approach to representing now offers considerable
performance benefits over existing techniques for accessing
bitemporal data.
1 Introduction
Relational data models and their implementations usually only capture a snapshot or current state of the real
world. A transaction then changes the database from one
state to another by replacing the old values with new ones.
However, there are many application domains where it is
necessary to keep the old database states or even store future
states. In addition, most production databases contain some
amount of time dependent data and most database technology applications are temporal in nature (e.g. scheduling,
financial and scientific applications). In fact, it is difficult
to identify any database application that does not require
some form of time-varying data. Conversely, the built-in
temporal support offered by commercial database products
and the Relational Query language ✂☎✄✝✆✟✞✡✠☛✠☛✠ [8] is limited to predefined, time-related data types. Some commercial databases include temporal extensions, e.g. the Oracle TimeSeries cartridge, Oracle 9i “Flash-Back”, and the
Informix TimeSeries Data-Blade, but these extensions still
do not fully support the successful management of timevarying data. Research has demonstrated that applications
can significantly benefit from using a temporal RDBMS,
and also that temporal support is needed that goes beyond
simple data types [7].
Associating data with time values and keeping a history
of fact validity is not technically difficult even using nontemporal RDBMS technology [13] [4]. However, it is a difficult task to efficiently query such time-varying data and to
identify integrity constraints that hold over several database
states.
A database is considered temporal if it is able to manage
time-varying data and it supports some time domain distinct from user-defined time. In temporal databases time can
be captured along two distinct time lines: transaction time
and valid time. A bitemporal database is a combination of
valid time and transaction time databases and records the
database states with respect to both valid time and transaction time. The valid time line represents when a fact is valid
and the transaction time line represents when a transaction
was performed. Recording bitemporal data generally requires that updates are appended to the database (rather than
overwriting existing values). This can easily lead to the storage of large volumes of data, and consequently makes the
selection of efficient access methods very important. Storing bitemporal data also requires the selection of appropriate time units (granularities). Without careful management,
the informational benefits of bitemporal data can be easily
outweighed by the costs of poor access times and difficulties in formulating queries [5].
In our current work we use the TQuel four-timestamp
format to represent bitemporal data [11], where, in addition
to non-temporal attributes, each tuple has four temporal attributes: ☞✍✌✏✎ and ☞✝✌✏✑ representing the starting and ending
time points of fact validity in the modelled world, and ✒✓✌✔✎
and ✒✓✌ ✑ representing the time when a tuple is inserted in the
database and the time it is logically deleted. Sample bitemporal data is shown in Table 1. A tuple is considered current
if it is part of the current database state, i.e. it has not been
logically deleted by assigning a timestamp to ✒✓✌ ✑ different
from the value of now. In the literature such a tuple, where
the validity of a fact in the modelled world is valid up to the
current time, is called now-relative.
Despite two decades of research in temporal databases,
relatively few papers have addressed the issue of indexing
temporal data. Even less have addressed the issues of how
to index now-relative data, or temporal data that are current
or valid now. Existing research shows that regular indices,
such as
- trees, are unsuited for temporal data [10], and
recently several other indices have been proposed. Only a
few index structures address the need to store the current
time, a need which is accommodated by almost all temporal data models and is natural and meaningful for many
kinds of applications. As valid time and transaction time
are considered to be orthogonal [12], bitemporal data can be
represented in two dimensional space, enabling us to apply
spatial indexes. For bitemporal indices based on R-trees,
the maximum-timestamp approach is a straightforward solution to the indexing of now-relative data. But it is obvious
that in this approach, facts with now-relative valid-time intervals are represented using very large rectangles, and the
resulting search performance is poor due to excessive dead
space in the index nodes and overlap between nodes. We
are aware of only a few structures that address the issues related to storing now-relative data in temporal databases [1],
[2], [9]. Some of the proposals rely on special variables until changed and now that should be part of a not yet existing
temporal relational model. Research also suggests that the
widespread acceptance of such a model is unlikely, due to
the large commercial investment in the existing relational
model, both in terms of developed code and expertise [7].
At the same time, due to the significant drop in the price of
disk storage, more and more database applications are using added temporal dimensions and, as a consequence, are
facing increasingly poor response times.
Bitemporal databases store past, present and even future
facts in either logically deleted or current tuples. To represent that a fact is current now, or that a tuple has not been
logically deleted, requires the storing of a value representing the current time. In the literature, several concepts to
represent current time have been proposed by including special variables, such as : “now” , “until-changed”, “forever”,
“ ”, “@”, and “-”. However, the same basic issue applies
to any approach, i.e. how physically to store that concept
in the database [3]. As now is not part of the domain of
SQL1999 values [8], it is necessary to represent the current
time by some other value, in such a way that the chosen
value is not overloaded (i.e. does not have more than one
meaning).
It has been shown that the choice of the physical value
for now significantly influences the efficiency of accessing bitemporal data [14]. Currently, the literature has concentrated on three basic approaches: firstly using NULL,
secondly using the smallest timestamp and thirdly using
the largest timestamp supported by the particular RDBMS.
✄
✂✁
A disadvantage of using NULL is that columns that permit NULL values prevent the RDBMS from using indexes.
Conversely, using a non-NULL value can also affect indexing badly. For example, when an index is used to retrieve
tuples with a time period that overlaps current time, and now
is represented with smallest or largest timestamp value, tuples with the ☞ ✌ ✑ (Valid time end) or ✒✓✌ ✑ (Transaction time
end) attribute set to now will not be in the range retrieved.
We have seen that current facts are represented by assigning the value now to ☞ ✌ ✑ , and that by assigning now to
✒✓✌✏✑ we can represent the belief that a tuple is current (or not
logically deleted). This shows importance of now in bitemporal databases. Further, the importance of now increases
when we consider that current tuples and current facts are
likely to be accessed more frequently. While issues related
to storing now are discussed in the literature in the context
of temporal databases, this equally applies to conventional
relational DBMS technology.
In this paper we propose a new approach to modelling
current time in temporal databases that overcomes the limitation of an attribute set to now not being in the range retrieved. In addition, our approach has significant computational advantages over the previously proposed methods
and, to the best of our knowledge, it is the only approach
to representing now that ensures the value now is not overloaded.
In the remainder of the paper, we first look more closely
at previous approaches to modelling current time and highlight their limitations and disadvantages. Then, in section
3, we present the “core” of the new proposed method for
representing current time, and empirically evaluate our approach in comparison to two of the standard existing approaches. Finally, in section 4, we present our conclusions
and discuss possible extensions and future work.
2 Traditional representations of current time
As time seems to be continuous, and current time is everincreasing, a significant question in computer science, particularly with respect to databases, is how to store the value
of current time (or now).
In line with the existing research, we have accepted a
discrete model of time. Since digital computers only support a limited granularity for real numbers, most proposals
for adding time to the relational model are based on a discrete, totally ordered set of time instants to represent both
the transaction and valid time dimensions. This ordering is
defined as follows:
☎✝✆✟✞✡✠
✜✤✣ ✠ ✄✥✜
✦ ✌✖✧☞✌ ✌✖✧ ✧✩★ ✌☞☛✍☎✪✌ ✌✏✆✍✎✑✫✬✌✓✠✒✔✒✓✄✭✒✔✌✏✜✯✖✌ ✕✖✮✱✌✔✰ ✒✓✌✖✒✔✧✩✒✓✲ ✌✘✗✚✌✖✙✑✧ ✛✢✧✩✲✳
✄✵✴✷✶
✰ ✌✖✧ ✧✸✲ ✌✖✧✩✲✳✄✹✴✺✶✻✰ ✌✖✧ ✞ ✌✖✧ ✧✩✲✭✄✵✴
For valid time :
☎✁✸✞ ✠
✦ ✌ ✧ ✌✏✌ ✧ ✧ ★ ✌☞☛✍☎✂✌✏✌✏✘✫✎ ✌✓✠✒✔✗✚✒✓✒ ✙✑✌ ✛✟✌✖✕✘✜✢✌✔✒✓✮✱✒✓✰ ✒✓✌ ✌✘✧ ✗✚✲ ✙✑✛✟✌ ✧ ✜ ✧ ✲✵✗✚✙✑✛ ✴✺✶
✰ ✌✖✧ ✧✩✲ ✌✖✧✩✲✹✗✚✙✑✛ ✴✺✶ ✰ ✌✖✧ ✞ ✌✖✧ ✧✩✲✵✗✚✙✑✛ ✴
For Transaction time :
In a discrete totally ordered model, a time interval, denoted as [ ☞ ✌ ✎ , ☞✝✌ ✑ ), represents a set of a countably infinite
equidistant time instants [4], where ☞✝✌ ✎ is the starting time
instant and ☞✍✌ ✑ is the ending time instant representing the
starting and ending boundaries respectively. These time instants are the smallest and largest values on the time line in
the set of continuous time instants making up a given interval. Also, a time interval [ ☞ ✌ ✎ , ☞✝✌✏✑ ) is closed on the lefthand side and open on the right. This means that the start
point of the interval = ☞✍✌ ✎ and the end point ✲ ☞✝✌ ✑ , i.e. the
interval includes the point ☞✝✌ ✎ and excludes the point ☞✍✌ ✑ .
In such a model, now-relative data contained in tuples
that are currently valid, can be represented as [ ☞ ✌ ✎ , now ).
Here ☞✍✌✏✎ represents the time point when the fact started to
be true and now represents that the fact is still current and
that the time validity of fact is continuously expanding (i.e.
its end is unknown).
Assuming we are interested in using existing technology
and given that the domain of SQL1999 values does not contain a special value for now, the task becomes one of selecting an appropriate value for now from an existing domain. This value should firstly satisfy the requirement that
it cannot be used with some other meaning, otherwise the
meaning becomes ambiguous and the value is overloaded.
As mentioned before, previous work has concentrated
on three physical values to represent now: the NULL
value , the smallest timestamp (MIN) and largest timestamp
(MAX) supported by a particular RDBMS. It is clear that
whichever of these values is chosen, the domain of the data
type becomes limited and a potential for overloading is created. This is especially the case for the NULL value, as it
is already overloaded in its normal usage. However, the
NULL value does have the advantage that it takes up less
space than a regular timestamp value and can be processed
faster. Despite this, the crucial disadvantage of NULL is
that columns that permit NULL values cannot be indexed
by a conventional RDBMS, leading to potentially unacceptable access times.
It is important to mention that using a non-NULL value
for now also can affect indexing. If, for example, a B-tree
index on ☞ ✌ ✑ or ✒ ✌✏✑ is used to retrieve tuples with a time
period that overlaps now, and now is represented with the
smallest or largest timestamp value, tuples with the ☞✝✌ ✑ or
✒✓✌ ✑ attribute set to now will not be in the range retrieved
(i.e. they will be at the extreme left or extreme right of
the B-tree). We term this as the range indexing problem.
Hence all previous approaches to model current time using
the largest timestamp value (MAX), the smallest timestamp
value (MIN) or NULL have disadvantages related to indexing. This has been highlighted in the literature and is particularly relevant to accessing bitemporal data, where the
choice of the value of now has been shown to significantly
affect access efficiency [14].
3 A new approach to model now
Each of the previously discussed approaches to representing now has severe limitations in terms of indexing or
overloading. Of these approaches, the literature generally
agrees that MAX is the best overall compromise [14], as it
allows indexing and generally has better performance than
MIN. However, MAX and MIN have further performance
problems navigating and updating time value indexes, due
to the redundancy of the special timestamp value used to
represent the current time. This means that all ✒✓✌ ✑ and ☞✝✌ ✑
values will be indexed to the same special value (i.e. MAX
or MIN) causing the index to search sequentially through
these records. We term this the index redundancy problem.
A major aim of our research is to overcome the limitations of previous approaches to representing now. These
limitations have been identified as overloading, the range
index problem and the index redundancy problem. From
a consideration of the redundancy problem, it became clear
that our solution should produce a value for now that is related to some distinct property of the tuple to which it refers.
In this way the level of redundancy can be reduced. Also, to
avoid the range index problem, a value is needed that is contained in the interval between the start time and the actual
current time. We therefore concluded that the best solution
would be to make the end point of any current interval equal
to the start point (i.e. ☞✍✌ ✑ = ☞✍✌ ✎ and ✒✓✌ ✑ =✒ ✌ ✎ ). This representation therefore defines the actual interval between the
start and end points of a current interval to be zero (i.e. it
becomes a point on the time line rather than an interval).
A first objection to this approach could be to say that
the value is overloaded, e.g. it becomes impossible to distinguish between an interval that actually started on the
12.12.01 and finished on the same day, from one that started
on 12.12.01 and is still current (given that the granularity
of our time value or chronon is one day). However, using
the definitions of interval start and end points we can see
that this objection is not valid: i.e. the interval [12.12.01,
12.12.01) is closed on the left (and so starts on the 12.12.01)
and open on the right (and so finishes before 12.12.01).
Therefore this interval has no meaningful duration. If it is
required to distinguish the start and end times more finely,
then the granularity of the time value must be changed (e.g.
from days to hours). From this discussion, we can see that
our new approach to representing current time also solves
the overloading problem as any tuple where ☞✝✌ ✑ = ☞✝✌✏✎ or
✂✁
✦
✭
✽
❆
✮■✮❂✮
✦✯❏❑✰ ✭ ✰✞❏❑▲ ❀ ✲
✄✆☎✞✝✠✟
✧ ✟✏★✂☎✂✕
✳✴✑✓✟✶✵✸✷✹☎✂✕
✧ ☎✂✻✯❃
✳✴✑✓✟✼❇✂✟✣✕
✮■✮■✮
✮■✮■✮
✡☞☛✍✌✏✎✒✑✓✎✔☛✖✕
✁✪✩✬✫
✤✛✟✣☎✂✺✏✷✹✟✼✻
✫❅❄✿✝✠✎✔✕
❈❊❉❋❉✹✎●✺❍✟✣✻
✮❂✮■✮
✮❂✮■✮
✗ ✘☞✑✚✙
✭ ✯✦ ✮ ✰✯✱✂✮ ✰✯✰
✭✖✽ ✮ ✰✍✾✿✮ ✰✯✰
✘✛✑✚✜✣✢
✦✣✰✂✮ ✰✯✲✂✮ ✰ ✭
✽ ✰✂✮ ✰✂✦✯✮ ✰✞✦
22.03.99
21.02.01
22.03.99
21.02.01
✮■✮■✮
✮■✮■✮
✮■✮❂✮
✮■✮❂✮
✗ ✤✥✑✚✙
✤✥✑✚✜✏✢
26.09.00
✰✂✦✯✮ ✰✍✾✿✮ ✰✯✰
26.09.00
10.04.99
10.04.99
✦ ✽ ✮ ✰ ✭ ✮ ✰✞✦
✮■✮■✮
✮■✮■✮
✭❁❀ ✮❂✦✏✰✞✮ ✰✯✰
✭✖✽ ✮ ✰ ✭ ✮ ✰✂✦
✮■✮❂✮
✮■✮❂✮
Table 1. Sample Bitemporal data with POINT representation of now
✒✓✌ ✑ =✒✓✌ ✎
is unambiguously current.
To illustrate our new point-based approach to now
(which we term POINT), consider the relation in Table 1.
Here, in tuple ID = 3, we can conclude that Mark has the
current position of “Admin”, firstly because ☞✍✌ ✑ equals ☞✍✌✏✎
(meaning the fact is currently valid) and secondly because
✒✓✌✏✑ equals ✒ ✌✏✎ (meaning the tuple has not been deleted).
Tuple ID = 1 represents that Megan had the position of
“DBA” from “21.08.00” to “10.05.02” and because the tuple is current (as ✒✓✌ ✑ equals ✒✓✌ ✎ ) this represents our current
belief about Megan’s past position. Tuple ID = 2 is logically
deleted (as ✒✓✌ ✑ differs from ✒✓✌ ✎ ), which means that we believed from “01.07.00” to “26.10.00” that Stephan was employed as a “Teacher” between “23.07.00” and “30.01.01”.
Tuple ID = 4 is logically deleted (as ✒✓✌ ✑ differs from
✒✓✌✏✎ ), which means that we believed from “13.02.01” to
“23.02.01” that Steven has a current employment as a “Officer” from “21.02.01”. Hence, when the timestamp for ☞✝✌✔✑
is the same as ☞✍✌✏✎ it means that a fact is valid now. Similarly when ✒✓✌✏✑ is the same as ✒✓✌ ✎ it means that a tuple
is current. Note also that in Table 1 we are displaying the
data as it would be stored in the database, in actual practice we would expect the end time value when ☞✍✌ ✑ = ☞ ✌ ✎ or
✒✓✌ ✑ =✒✓✌ ✎ to be displayed to the user as some special symbol
(such as now, until changed or NULL).
3.1 Experiments
In order to evaluate our POINT approach to representing now we decided to empirically compare POINT to both
the MAX timestamp approach and to using NULL. In doing this we followed previous research in the area, but in
contrast to previous work, we decided not to test the minimum timestamp value (MIN) as this has already been shown
to be consistently worse than MAX [14]. For each of our
methods we generated three relations, each differing only in
the physical representation of the current time value. Then
on each relation we performed three different representative
time slice queries shown in Figure 1. We chose time slice
queries because of their recognised importance in temporal
databases [15].
Query One retrieves the current state in both transaction
and valid time. It selects tuples with transaction and valid
Figure 1. Types of Time slice Queries
time intervals that both overlap with the current time, as
well as retrieving tuples where valid time and transaction
time ends at now. Basically, Query One retrieves what we
currently believe about the current state of the world.
Query Two time slices the relation as of now in transaction time and as of a past time in valid time. In other words
it retrieves our current belief about a past state of the world.
Query Three time slices the relation as of a past time in both
transaction time and valid time, which means it retrieves a
past belief about a past state of reality. Queries One and
Two favour the current state, because this state is assumed
to be accessed much more frequently than old states.
Results for 10% current data
Disk CPU Duration Exp. Type
✭✖❀
✾✯✲ ❀✖❆
✾❁✲❁✱
POINT10-1
✭ ▲
✦ ❆✍✽❁✽✯✽
✦✼✲ ❀ ✲
MAX10-1
✭ ✲ ❀❁❀✯❀
✽✯✭
✦✯✦ ✽ ✱
NULL10-1
✭❁✭✖❆ ✱✍✲
✽❁❀
✦✣✱✯✲✯✲
POINT10-2
✽ ✦✣✱ ✭ ✾
✭ ✱ ✭✯✭
✲✿✦
MAX10-2
❆ ✱✍✾✖✰ ✽
✭❁❀ ✰✍✾
✾ ❆
NULL10-2
❆ ▲✍✲❁✾✂✦
✽ ✱
✦ ❀ ✰✯✱
POINT10-3
✭ ✦✣✲❁✰
✲ ✭ ✾ ❆ ▲
✲❁✾
MAX10-3
❀✯✭
✲✖▲ ✽ ✱ ✭
✦✣✱❁✰✞✦
NULL10-3
time slice query type (i.e. either Query One, Two or Three).
Results for 20% current data
Disk CPU Duration Exp.Type
✽ ✾
✦ ✭❁❀✖❆ ✦
✦ ✭✖❆ ▲
POINT20-1
✭ ✾❁✱ ❆ ✱
✭✖❀ ✾✂✦
❆ ✾
MAX20-1
❆ ✰✯✱✂✦✣✱
❆ ▲
✦ ❀✯✽ ✱
NULL20-1
✽ ✲ ✭ ✾✯✲
✭ ✲✯✾✯✲
✲ ❀
POINT20-2
❀ ✲ ❆❁❆ ✱
❆✯✽ ▲✍✾
▲✞✦
MAX20-2
❆ ✦✼✲ ❆
✱ ✭ ✰ ❆ ✦
▲✯✱
NULL20-2
❆✿✭ ✾ ❀✍✭
✭ ✾❁▲ ✭
❀ ✲
POINT20-3
✽ ✦ ❀ ✱
✱ ❀✍✭ ✱✯✰
✱✞✦
MAX20-3
✭ ▲✯▲✯✱
▲✯✱✯✰ ✽✯✽
▲✯▲
NULL20-3
3.2 Analysis
Our tests were performed on a four 450MHZ CPU SUN UltraSparc II processor machine, running Oracle 9.2.0
RDBMS, with a database block size of 8K. During the tests
the server had no other significant load. We performed experiments using four different sizes of SGA (System Global
Area): 30MB, 50MB, 100MB and 200MB in order to in-tree
vestigate the effects of aging buffers. We created
composite indexes on ☞ ✌ ✑ and ☞ ✌ ✎ and a
-tree index on
✒✓✌ ✑ for all tables. We also performed additional tests using
function based indexes.
✂
Results for 40% current data
Disk CPU Duration
✭✯✭✖❀❁❆✍✭
❆ ✱✂✦✣✱❁✱
✲ ❆ ✱✯✰✯✾
❀✍✭ ✱ ❆ ✱
✦ ✭✖❆ ✲ ❀ ✦
✦ ❆ ✱✯✲❁▲ ❆
✾ ❀ ✦✣▲✂✦
✦✯✦✣✱ ✽✯✽✖❆
✦✣✰✯▲ ✽✯✽✖❆
✦✣✱✍✾ ✭
❆ ✾✂✦✏✰
✭✖❆✿✭✖❀
❆✿✭ ✲ ❀
✾ ✽ ✲ ❀
❀ ✲ ❆✍✭
❆ ✾❁▲❁✰
❀ ✦✯✦✣✲
❆✍✽✯✽❁❀
❆ ✱
✾✯✲
✾❁✰
✱✯✱
✦ ❆ ✲
✦ ✭❁✭
✦✯✦❁✦
✦✼✾❁✲
✦✼✲ ❆
✁
Exp.Type
POINT40-1
MAX40-1
NULL40-1
POINT40-2
MAX40-2
NULL40-2
POINT40-3
MAX40-3
NULL40-3
Our queries were executed on nine different bitemporal
tables, three for each representation of now, with each
table having a random distribution of one million temporal
tuples and a granularity of one day. Within this data set,
each representation of now was tested on three separate
relations: In the first relation, 10% of the tuples overlapped
with the current time in both transaction and valid time. In
the second and third relations this percentage was 20 and
40, respectively.
The tables present our experimental results for 100MB
SGA, where CPU usage is measured in CPU units, duration
is measured in seconds and Experiment Type uses the
notation METHOD - , where METHOD represents the
method used to model now, i.e. either our new POINT
approach (where ☞✝✌ ✑ = ☞✍✌ ✎ and ✒✓✌ ✑ = ✒ ✌ ✎ ) or MAX (using
the Oracle max timestamp “31-DEC-9999”) or NULL;
represents the percentage of the tuples in the experiment
overlapped with the current time in both transaction and
valid time (i.e. either 10, 20 or 40%); and represents the
✄
✗
✗
✄
The results show that the new POINT representation for
now clearly outperforms both MAX and NULL in terms
of disk reads, CPU usage and query duration across the
full range of our problem set. Looking more specifically,
Query One most accurately measures the effect of varying
the percentage of tuples overlapped with current time, as it
retrieves all such tuples. On this measure we can see all
techniques start slowing down as the percentage of overlap increases (i.e. as we move from 10-1 to 20-1 and 40-1)
but also that the relative advantage of POINT over the other
techniques grows as the overlap increases. For instance, at
10-1 the duration times for POINT MAX and NULL are 26,
29 and 32 respectively, whereas at 40-1 this has changed to
48, 75 and 70 (i.e. POINT has gone from being roughly
comparable to NULL and MAX to performing almost as
twice as fast). This demonstrates the distinct advantage
POINT has in retrieving current records, due to POINT representing current records as single points on the valid and
transaction time lines rather than as the intervals defined by
differing start and end points used by MAX.
The performances on Queries Two and Three again show
POINT to be consistently superior, although here the relative performance of the three techniques remains relatively
stable as the overlap percentage increases. This is because
Queries Two and Three are more concerned with past states
of the database and so are not so affected by the proportion
of tuples that overlap the current time.
Also, the disk access results show POINT to be consistently better than MAX or NULL especially on the 10-1
problems for Query One where the duration measures are
fairly similar. According to the Theory of Indexability [6],
the I/O complexity cost measured by the number of disk
accesses for updating and answering queries is one of the
most important factors for measuring performance. This is
because, as technology advances, CPU speeds tend to increase relatively faster than disk I/O speeds. In Figures 2,
3 and 4 we graphically represent our results for the case
when
= 10% (i.e. 10% of tuples overlap with current
time in both valid and transaction time with SGA again set
at 100MB). This gives a clearer representation of the relative performance of each technique and highlights the dominance of POINT especially in terms of disk reads on Query
One.
We also investigated, through further experimental study,
several possible factors that could interfere with our results,
but due the limitations of space we will only briefly mention them. One of the factors that can affect the number
of physical disk reads is the size of SGA. As previously
✄
Figure 2. Number of Physical disk reads
Figure 4. Duration of queries in seconds
4 Conclusion and Future work
This study makes the following contributions to the field:
by investigating different representations of now in
bitemporal databases, we presented a better understanding of the significance of modelling current time,
particularly in the context of efficiently accessing
bitemporal data;
we identified limitations of previous approaches to representing now, namely overloading, the range index
problem and the index redundancy problem;
we proposed a new approach, called POINT, to represent current time in bitemporal databases; and
Figure 3. CPU Usage
mentioned, we looked at the effects of aging buffers by
varying the size of SGA through 4 values: 30Mb , 50Mb,
100Mb and 200Mb. Our results showed small differences
between approaches and favoured larger values of SGA.
The main difference noted was that a smaller SGA size
tends to favour the NULL representation of now, resulting in relatively fewer disk reads. This is because a NULL
value uses less space than a timestamp and so the chance
of the buffer aging is reduced. Apart from this, we did not
notice any significant effects on the results when comparing the different representations of now on different sizes
of SGA, i.e. changes in the number of physical disk reads,
CPU usage and query duration are virtually linear for all the
approaches considered.
in the experimental study we have demonstrated that
the proposed POINT based approach not only overcomes the limitations of previous approaches, but also
leads to a significant improvement in the efficiency of
querying bitemporal databases in comparison to existing methods.
Currently, we are investigating the effect of the proposed
POINT approach on the performance of multidimensional
indexes, such as the spatial R-tree index.
Finally, the POINT-based method to represent now in
temporal databases introduced in this paper, could significantly effect performance of previously proposed index
structures for temporal data. It would be interesting to investigate the effect of POINT on various index structures
for temporal data with respect to space usage and response
time. It would be also interesting to look at the time required
for index restructuring in relation to insertion, deletion and
updating. This investigation should follow the directions
used for the worst case scenario in [10].
References
[1] R. Bliujute, C. S. Jensen, S. Saltenis, and G. Slivinskas. R-tree based indexing of now-relative bitemporal data. In VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, New
York City, New York, USA, pages 345–356, 1998.
[2] R. Bliujute, C. S. Jensen, S. Saltenis, and G. Slivinskas. Light-weight indexing of general bitemporal
data. In Statistical and Scientific Database Management, pages 125–138, 2000.
[3] J. Clifford, C. Dyreson, T. Isakowitz, C. S. Jensen,
and R. T. Snodgrass. On the semantics of “Now” in
databases. ACM Transactions on Database Systems
(TODS), 22(2):171–214, 1997.
[4] C. Date, H. Darwen, and N. Lorentzos. Temporal Data
and the Relational Model. Morgan Kaufmann, 2002.
[5] C. E. Dyreson, R. T. Snodgrass, and M. Freiman. Efficiently supporting temporal granularities in a DBMS.
Technical Report TR 95/07, 1995.
[6] J. Hellerstein, E. Koutsupias, and C. Papadimitriou.
On the analysis of indexing schemes. 16th ACM
SIGACT-SIGMOD-SIGART Symposium on Principles
of Database Systems, 1997.
[7] C. S. Jensen. Introduction to temporal databases, research. http://www.cs.auc.dk/ csj/Thesis/pdf/
chapter1.pdf, 2000.
[8] J. Melton and A. R. Simon. SQL:1999 - Understanding Relational Language Components. Morgan Kaufman, 2002.
[9] M. A. Nascimento and M. H. Dunham. Indexing
valid time databases via
-tree. IEEE Transactions
on Knowledge and Data Engineering, 11(6):929–947,
1999.
✝✁
[10] B. Salzberg and V. J. Tsotras. Comparison of access
methods for time evolving data. ACM Computiong
Surveys, 31(1), 1999.
[11] R. Snodgras and et al. The temporal query language
TQEL. ACM TODS, 12(2):247–298, 1987.
[12] R. Snodgrass and I. Ahn. Temporal databases. IEEE
Computer, 19(9):35–42, 1986.
[13] R. T. Snodgrass. Developing Time-Oriented Database
Applications in SQL. Morgan Kaufmann, 2000.
[14] K. Torp, C. S. Jensen, and M. Bohlen. Layered implementation of temporal DBMS concepts and techniques. A TimeCenter Technical Report TR-2, 1999.
[15] V. J. Tsotras, C. S. Jensen, and R. T. Snodgrass. An
extensible notation for spatiotemporal index queries.
ACM SIGMOD Record, 27(1):47–53, 1998.