Special Relativity in The Affine Spacetime

On a rigorous formulation of relativistic spacetime
Michael Yu
June 25, 2012
This article is going to assume knowledge of abstract mathematics. Properties of and terminologies relating to of sets, abstract vector spaces, and the real
numbers will be used freely. The most important mathematical notation which
may be unfamiliar include , which is a symbol for for all, and , which is a
symbol for there exists.
Structure of the vector spacetime
Before we investigate the properties of spacetime implied by experimentally

proven postulates of special relativity, we shall develop a mathematical formalisation of the concepts of physical quantities which can then allow us to talk
about physical things whose mathematical properties are explicitly stated.
1.1
The affine spacetime
Let us first characterize the physical universe as a four dimensional space of some
kind, with the three ordinal directions represented in three of the dimensions
and time in the remeaining dimension. Experimentally, we find that the laws
of physics is uninfluenced by location or time (within human reach), so let us
define the spacetime continuum as an affine space over a spacetime vector space.
An affine space is just a vector space where only the difference between elements
act as vectors. To quote from Wikipedia, an affine space is what is left of a
vector space after youve forgotten which point is the origin. We proceed to
define an affine space.
Definition 1.1.1. An affine space is a set A with an addition operator defined
as the following map
V A A : (v, a) 7 v + a
which satisfies three properties:
Left identity
a A, 0 + a = a
Associativity
v, w V and a A, (v + w) + a = v + (w + a)
Uniqueness
a A, V A : v 7 v + a is a bijection
(Copied from Wikipedia to here for ease of use)
1.2
Physical vector quantities
As effective as an affine space as a model for spacetime, it is far more useful and
efficient to work with the underlying vector space. We almost always use either
measure or calculate time interval and distances to directly use in physics, as
opposed to time or and position. Therefore we shall focus our attention on the
spacetime vector space, which I will call the vector spacetime from here on.
For convenience, we will just let the field of the vector field be the real numbers.
We frequently add together physical quantities with the same units and
multiply these quantities with real numbers. Such physical quantities that can
be added and stretched like that form a one dimensional vector space. We
define a physical vector quantity (or just vector quantity for short) to be any
physically motivated vector space like this and we shall call any element from any
vector quantity to be a vector value. Time interval is such a vector quantity
and is also a subspace of vector spacetime. We shall denote this time vector space
as T . The spatial vectors subspace, which we shall denote as X , that make up
the rest of vector spacetime have three dimensions, and such can be expressed as
the direct sum of three spatial vector quantities. We see an important distinction
between the physics used in applications and the mathematical physical model
we are using in the case of space. Even though distances have the same units
of measurements (e.g. metres), they do not have the property that would make
them into one physical quantity. In particular, one metre in the x direction adde
to one metre in the y direction does not give you two metres in some direction.
We also identify all the 0 of every vector quantity as one unique element. This
is to capture the idea that 0 distance, 0 time interval, 0 force, or 0 anything else
might as well be the same thing since they all represent a nil physical quantity.
Extending on this idea is that vector values with different units can actually be
added, just like how a metre in the x direction can be added to a metre in the y
direction. The added vector values will just happen to be linearly independent.
Of course, most of the time the resulting vector value is quite meaningless,
but if and when we need to do such a thing the mathematics for it is ready.
Nonetheless, note that any linear combination of vector values is a vector value
in some vector quantity. Let us denote the sum (in the adding of vector spaces
sense) of all vector quantities, which happens to be the set of all vector values,
as V.
Just as often in physics we need to take the product of two vector values to
get another vector value. Let us agree to call this product to be developed in
the article the physical product to avoid confusion with all the other products
in physics and furthermore assume that all products on two elements in V that
appear later denote this physical product, unless otherwise stated. Now that
we have the two most basic operators that make up physical relationships and
laws, we shall make a wish list of properties that we would like them to have so
that they can best model the existing mathematics in physics.
Addition axioms
Closure under addition

v, w V, v + w V
Associativity of addition
u, v, w V, u + (v + w) = (u + v) + w
Commutativity of addition
v, w V, v + w = w + v
Identity element under addition
0 V such that v V, v + 0 = v
Inverse elements under addition
v V, w V such that v + w = 0
All of these addition axioms are implied by the definition of V as a vector space,
the addition operator as the usual addition operator on an abstract vector space,
and the 0 as the 0 vector. We will drop the use of 0 and just use 0. It is then
only really the physical product that we need a wish list for.
Product axioms
Closure under product
v, w V, vw V
Associativity of product
u, v, w V, u(vw) = (uv)w
Commutativity product
v, w V, vw = wv
We have physical quantities like a second and a hertz, which multiply to the
number 1, which in turn acts as a multiplicative identity. Since the closure
property of the physical product is important, a vector quantity that has a
product identity element must exist. With that we finish the product axioms
wish list.
Identity element under product
1 V such that v V, v1 = v
Inverse elements under product

v V, w V such that vw = 1
Homogeneity under scalar multiplication
v, w V and a R, a(vw) = (av)w = v(aw)
As it will turn out the physical product is actually not commutative for all
vector values. With this in mind, we note that (v V, 1 v = v) and (v
V, w V such that vw = 1 = wv) are implied by the product axioms even
without commutativity. A proof of this will not be included since we can just
define them to be true. Now we see that not assuming commutativity, it is easy
to prove that for all a R and for all v V, av = (a1 )v. As such, taking the
physical product with elements from span(1 ) is the exact same thing as scalar
multiplication. Let us thus identify a1 with the real number a, so that for v V
we use av to represent both av and (a1 )v, we use a + v to represent (a1 ) + v,
and we use a to represent a1 anywhere that a needs to be a physical vector
value. Now we have the real numbers represented in a physical vector space.
We have one last axiom to put on our wish list.
Distributive law
Distribution of product over addition
u, v, w V, (u + v)w = uw + v + w and u(v + w) = uv + uw
1.3
The vector spacetime
Let us now confirm that the axioms of the physical product are physicall sound.
We begin by using some familiar physical quantities to derive the structure of the
physical product on the vector quantities we are currently concerned about, time
and three dimensional space. One example of a telltale physical relation is from
1 |s|2
1
2Ek m1 t2 = |s|2 .
established physics Ek = m|v|2 Ek = m
2
2 t2
The left side of the final equation should form a vector value because we can
add all the values that take on that form together to form scalar multiples of
each other. This implies that even though the spatial vector value on the right
side can be any vector from three dimensional space, its norm squared is always
an element of one particular vector quantity. However, to talk about |s| as
a physical quantity would require the construction of a directionless distance
vector quantity, which is moreover not a good model of reality since in physics
we never see equations like |s1 |+|s2 |. Therefore, motivated by the fact that for a
general vector v representing a physical quantity, |v|2 = v v where the product
on the right side is the dot product, we define the physical product to have the
property that x, y X \ 0, a R+ such that ax2 = y 2 . The scalar coefficient
is restricted to be positive because when mass and time interval are positive,
5
the kinetic energy is always positive no matter the displacement. In particular,

a vector value of energy has the same sign no matter the displacement, so x2
where x X should also always have the same sign.
Anyway it would appear I have run out of time. Therefore I shall skip the
journey to the insight and cut to the gold. As it turn out, the best way to keep
as many properties of the physical product that we want is to have xy = q(x2 ),
where x, y X and q H, i.e. the set of quaternions. This comes from the
motivation that if x and y are orthogonal spatial vectors and x2 = y 2 , we want
2

1 x + 1 y
= x2 due to spatial displacements property of satifying the
2
2
Pythagorean equality. But with the distributive property and other more basic
2

axioms, we have 12 x + 12 y = 12 x2 + 21 xy + 12 yx + 21 y 2 = x2 + 21 (xy + yx).
This forces xy + yx = 0, which, when requiring the very physicall accurate
condition that xy 6= 0, implies xy = yx. This is perfectly modeled by x = vy,
where v is a unit purely imaginary quaternion. Furthermore, explorations of the
inverses of spatial vectors show that if x and y are orthogonal spatial vectors,
(xy 1 )2 = 1, and this essentially constructs the quaternions as a subspace of V.
All spatial vectors can be represented by a purely imaginary quaternion
multiple of a vector quantity. Let this vector quantity be time interval. Thus
time interval in our mathematical model has a unit that when squared, gives
the same unit as distance squared. Due to the importance of the speed of light,
given t2 = x2 , let t represent the time necessary for light to travel a spatial
displacement of x. Let S denote span(x2 ).
The spacetime interval
In this section, let us agree that t will denote a time interval value, x will
denote a spatial displacement value, v will denote a purely imaginary quaternion
representing xt1 for some x and t, and s will denote an element in S.
2.1
Inertial frame shifts
The notion of relativity is best explored with looking at the points of views of
observers moving at relative speeds to each other. We proceed to define inertial
reference frames.
Definition 2.1.1. An inertial reference frame is a way of defining the universe as an affine spacetime such that objects moving at constant velocity will
keep moving at that velocity unless a force acts on it. The postulates of special
relativity give more properties of inertial reference frames.
Principle of relativity The laws of physics, including physical constants, hold
and are the same as in any other inertial reference. What laws of physics
included here will be introduced as they are needed. Furthermore, all the
laws of physics are the same no matter where and when in the universe
the law is tested. Also no laws of physics should change depending on the
intrinsic labelling of directions.
Constancy of the speed of light There is a speed such that for all reference
frames, anything travelling at that speed also travel at that speed in any
other reference frame. Experimentally, this speed is the speed of light.
We want to turn the postulates of relativity into mathematical properties
that spacetime must satisfy. First, we shall work with inertial frame shifts, which
is a transformation mapping all the affine spacetime points in one inertial frame
to another. Since two affine spaces over vector spaces with the same dimension
(in our case four) are isomorphic, inertial frame shifts are merely functions from
the affine spacetime to itself. Let us be mindful of which subspace is the time
dimension in the affine spacetime before and after, since time is distinct from
the spatial dimensions. Like with our analysis of physical vector quantities, we
shall see that it is more convenient to work with the vector spacetime. Firstly
we define p q, p and q being elements of an affine space, to be the unique
vector r such that r + q = p
Theorem 2.1.2. Any function f mapping an affine space A to another affine
space Z can be completely described by a function g defined as g(a) = f (a) for
some constant a A and g(b) = h(b a) + g(a), where h is a function from the
vector space underlying A to the vector space underlying Z.
Proof. Choose any a A and fix g(a) = f (a). Now define h(v) = f (v +a)f (a)
where v is a vector underlying A. Then for all b A, g(b) = f ((b a) + a)
f (a) + g(a) = f (b) f (a) + f (a) = f (b).
Since anyone concerned with two different reference frames can just agree on
some common point of space and time, we only need to investigate inertial frame
shifts as a mapping from vector spacetime to itself. Let us agree to use frame
shift to mean the mapping from vector spacetime to itself that can be used to
completely describe any inertial frame shifts. Let us use F to denote any frame
shift and let us use t0 + x0 to denote F (t + x). First we note that frame shifts are
invertible, and therefore bijective, since we are defined to be allowed to frame
shift back from the new inertial reference to the old one. By the postulates of
special relativity, the spacetime location does not affect physical laws, which
include laws governing frame shifts, so, using the same notation from the proof
of 2.1.2 we must have that no matter what a we choose, h is the same mapping.
Then with v, w T X ,
h(v) + h(w) = (f (v + a) f (a)) + (f (w + a) f (a))
= (f (v + (w + a)) f (w + a)) + (f (w + a) f (a))
= f ((v + w) + a) f (a)
= h(v + w)
This shows that frame shifts are additive. Now consider t and F (t). By the
additivity of frame shifts, F (nt) = nF (t) for n N. Suppose u = mt, m N.
7
Then
F (u) = mF (t)

1
u
m
1
F (u) = F
m
So therefore F (rt) = rF (t) for rational r. Honestly this has taken me so, so, so,
so, so, much more time than I had anticipated, and its 5:26 AM, so Im just
going to get straight to the point and summarize everything. Just assume that
frame shifts are homogenous with degree one one the real numbers when the
vector value is time only, and use Newtons first law to derive that frame shifts
are homogenous for any argument. Thus frame shifts are just linear operators
on T X .
To formalize the constancy of the speed of light, we set t2 + x2 = 0 t02 +
2
x0 = 0 (recall that t2 and x2 are defined to always have opposite signs). From
the linearity of frame shifts, I proved the stronger statement t2 + x2 = t02 + x02 .
The spacetime interval t2 + x2 is therefore invariant under frame shifts. From
this, and the fact that direction does not affect physical laws, I derived the time
dilation formula. If F (t) = t0 + vt0, then
t
t0 = p
1 |v 2 |
Next steps are to derive the complete Lorentz tranformations and then use
F = ma to derive relativistic mass.

Special Relativity in The Affine Spacetime

Uploaded by

Copyright:

Available Formats

Special Relativity in The Affine Spacetime

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Special Relativity in The Affine Spacetime

Uploaded by

Copyright:

Available Formats

On a rigorous formulation of relativistic spacetime

Structure of the vector spacetime

Before we investigate the properties of spacetime implied by experimentally

The affine spacetime

Physical vector quantities

Closure under addition

Inverse elements under product

The vector spacetime

the kinetic energy is always positive no matter the displacement. In particular,

The spacetime interval

Inertial frame shifts

You might also like