“In mathematics you don’t understand things. You just get used to them.” (John von Neumann)
Although special relativity is a theory of physics, the chief ingredient in deriving its astonishing results about space and time is mere logical thinking. Besides that, only surprisingly few initial experimental facts are needed to develop the theory.
In general, mechanics studies the laws of motion. The basic idea is that physical objects exert forces on one another during their interactions, and it is the forces, or lack thereof, that eventually determine motion. For the most part, classical mechanics was developed by observing solid objects in our environment, including heavenly bodies.
In the following, some of the core concepts, laws and assumptions of classical mechanics are presented. We focus on rigid objects, because it suffices for our purposes.
Properties of space
The space is static and homogeneous (incl. isotropy). It consists of locations that physical objects can occupy. To identify locations, coordinate systems are used.
(Definition) A Cartesian coordinate system that is fixed to a rigid object is called a reference frame.
Properties of time
Time is homogeneous and constantly passing. It consists of durations that events can occupy. To identify (measure) durations, clocks are used.
Time is absolute, it permeates the whole universe. All physical objects, even the most remote ones, are connected through time. They all “experience” the very same time.
Locations of space
Reference frames identify locations only momentarily: a millisecond later we can’t tell whether a location that is static in a given reference frame still corresponds to the same (absolute) location in space.
Still, it is assumed there exist reference frames that are at rest in space and thus permanently identify the locations of space. There is no known way of finding such a reference frame though.
(Definition) A reference frame in which the law of inertia holds is called an inertial frame.
The law of inertia states that the velocity of a particle (= point-like rigid object) may only change if there is force acting upon the particle.
Note: velocity is speed together with its direction; i.e. it’s a vector.
Intuitively, this means that if all forces acting upon a particle were “switched off”, it would continue to move at the constant velocity it has reached until then. Thus, there is a physical reason to think that not all reference frames are “equal”, but some of them are “special”, namely the inertial frames.
(Theorem) Given any one inertial frame, exactly those reference frames are inertial frames which are moving at a constant velocity relative to it.
Note: a practical way of switching off forces is to net them out vectorially.
Principle of relativity
In every inertial frame, the laws of motion are exactly the same. That is, with respect to classical mechanics, inertial frames are all equivalent. The following, plausible generalization is called the principle of relativity, and its validity will be assumed everywhere in this essay:
(Principle) The laws of nature are exactly the same in every inertial frame. That is, inertial frames are all equivalent in describing any physical phenomena.
It’s a well-established fact that a given ray of light propagates in vacuum at the constant speed c ≈ 300’000’000 m/s, relative to every inertial frame. This is contradictory, since according to classical mechanics the speed of a given entity, i.e. that of the tip of the ray of light in this case, should depend on the reference frame in which it is observed.
Note: throughout this essay, light is always meant to travel in vacuum.
To eliminate the contradiction, it seems necessary that our intuitive concepts about space and time are revisited and challenged. In order to ensure that we are not misled by intuition, Einstein suggested that all definitions in physics must be based on measurements which are, in principle, feasible.
Scope of study
In the following, we limit our attention to phenomena that take place in inertial frames, and in which rigid objects, forces acting upon them, as well as rays of light are involved. A clear distinction is made between rigid objects and light: the latter is considered more like a signal, not as an “ordinary” physical object. Elastic solid objects (threads, springs, etc.) are also allowed, but used only as a means to exert forces on rigid objects in a measurable way.
All in all, the intended scope is the most general as far as the motion of rigid objects in inertial frames is concerned: there are no restrictions expected that would limit the applicability of the results obtained later.
The scope of a theory
(Definition) The circumstances under which a theory can be considered correct are called the scope of the theory.
The originally intended scope can shrink as new facts come to light. This happens when it turns out that some (often tacit) assumptions cannot be considered correct in all situations. In special relativity, two such assumptions are Euclidean geometry and continuous quantities. Both have already been challenged by later developments in physics.
Note: there are plenty of other, tacit assumptions which we don’t even notice. We can rest assured that all those will be challenged some day.
Still, it is more constructive to say that the assumptions are correct within limits, rather than saying they are just incorrect.
Foundations of relativity
The next couple of sections lay, based on Einstein’s suggestions, the measurable foundations needed for the discussion of relativistic effects coming afterwards. Although almost all of the concepts and results here seem intuitive (and even banal time to time), they will be of the utmost importance for clarity and understanding when proceeding further.
It’s also demonstrated how cumbersome things can get when the classical ground is cut from under one’s feet, i.e. when only those things can be taken for granted that either have been measured or were inferred from the symmetries of a given situation. Putting it another way, we’re going to describe what one can say about space and time without blindly assuming anything that isn’t supported by tangible evidence, while at the same time making no use of the constancy of the speed of light. The latter will be used only afterwards, in the parts on relativistic effects.
We do make one assumption though to start with: that the geometry in every inertial frame is Euclidean. Keep in mind that our final goal is an adjustment to classical mechanics, as minimal as possible, that eliminates all contradictions posed by the constancy of the speed of light. And the Euclidean-ness of geometry is not among the top suspects to doubt.
Lastly, the word “symmetry” appears in many of the proofs. In the context of an argument, it refers to a kind of reasonless-ness, the common sense that two things must be identical if there is no sensible reason for difference. It indicates a level where we still trust our intuition.
Note: the basis of all physical symmetries within a single inertial frame is the principle of relativity, as it ensures that the inertial frame can be considered on its own, while properties like its motion relative to other, maybe even “favored”, inertial frames do not matter.
In an inertial frame K, duration can be measured at each single location by using a uniformly ticking clock that is fixed right there.
Note: uniformity is ensured by producing all ticks by the very same method, so that we cannot think of any reason why one tick should take longer than the other.
The simultaneity of events that happen directly to entities at rest relative to K is measured as follows:
(Definition) Two events are simultaneous in K if and only if a symmetrically placed observer in K sees them, by the naked eye (through vacuum), happen simultaneously.
This definition is compatible with the classical conception of time.
Notes: (a) by saying “observer in K”, it is meant that the observer is also at rest relative to K, (b) we can imagine that each entity sends a light signal to the observer when its local event happens, (c) here and in the following, locations, events, entities, observers, signals and clocks are, unless explicitly stated otherwise, always meant to be point-like, (d) accordingly, by “light signal” it is understood just the tip of a ray of light.
Instead of light, other signals could also be used, e.g. pistol bullets or even carrier-pigeons, as long as the symmetry of their propagation can be assumed. Since we are talking about mechanics, it seems reasonable to require only the symmetry of the net forces acting upon the chosen pair of (identical) “messengers”. Einstein’s original approach eliminates this requirement elegantly: it uses light signals and assumes that nothing whatsoever can make a difference to their propagation in vacuum. The same is definitely not true for e.g. a bullet whose motion is affected by a multitude of gravitational forces at the very least.
(Definition) Two clocks (at rest) in K are synchronized if the same positions of their hands are simultaneous events in K.
Notes: (a) in general, by saying “E in K” it is meant that E is at rest relative to K, (b) we’ll always assume that all clocks are of the same construction.
Using mostly symmetry-based reasoning, we infer all the below:
(Theorem) If one position of the hands of two clocks in K are simultaneous events, then all positions are.
(Proof) There is no such difference in the situations of the two clocks that would explain why a symmetrically placed observer would see them showing different times, ever.
(Lemma) If a light signal is sent earlier from one location to another in K, it also arrives earlier.
(Proof) If synchronized clocks are used at the two locations, the difference between the time values at sending and arrival, as shown by the corresponding co-located clocks, is always the same. This can be seen if we add another pair of synchronized clocks that, when sending the second light signal, both show the time as it was when the first one was sent. The original and the added clocks run then in parallel, and there is no reason why the latter would not show the same time difference for the second light signal as it was for the first one.
(Theorem) All symmetrically placed observers in K judge the simultaneity of two events the same way.
(Proof) Let o1 and o2 be two symmetrically placed observers, and let o2 send light signals s1 and s2 to o1 upon seeing the two events e1 and e2, respectively. Due to symmetry, o1 will observe the same time difference between seeing s1 and e1 as that between seeing s2 and e2. Thus e1 and e2 are simultaneous to o1 if and only if s1 and s2 are sent by o2 at the same moment.
(Theorem) Simultaneity in K is transitive.
(Proof) Let event e1 be simultaneous with e2, and e2 with e3. If any two of the events are co-located, the statement is trivial. Otherwise, create an event e4 which is simultaneous with e2 but not located on any of the e1e2, e1e3, e2e3 lines. Then, an observer in the circumcenter of the triangle e1e2e4 will see that e1 and e4 are simultaneous. Similar is true for the triangle e4e2e3, i.e. e4 and e3 are simultaneous too. Finally, an observer in the circumcenter of the triangle e1e4e3 will see that e1 and e3 are simultaneous.
The last two theorems basically say that the definition of simultaneity is consistent.
For simplicity, it is assumed in the following that in every inertial frame, all clocks are already ticking in sync… just by coincidence.
So far the classical conception of time was untouched. What we’ve solely gained is a rather restricted, but rock-solid way of measuring time.
(Definition) In an inertial frame K, the time read off from the synchronized clocks is called the coordinate time of K.
The definition of simultaneity in K can now be extended to include those events that happen directly to entities in motion:
(Definition) Two events are simultaneous in K if and only if they happen at the same coordinate time.
The coordinate time of an event can be read off from the clock in K that is momentarily co-located with the entity to which the event happens.
As long as there is only one inertial frame considered, time seems no different from that of classical mechanics.
Let K denote an inertial frame.
(Assumption) Particle trajectories are continuous in K. That is, the t ↦ (x, y, z) relation is a continuous function, where t denotes the coordinate time in K and (x, y, z) the momentary location of the particle relative to K.
Notes: (a) backed by the fact that the speed of light is finite, similar can be assumed about signals too, (b) the assumption implies that particle speeds are always finite.
(Definition) In K, the distance between two particles at coordinate time t is the distance between their respective momentary locations.
This definition is compatible with the classical conception of distance, and includes the possibility that the particles are moving relative to K. So we rely on coordinate time to define the distance between moving particles.
As long as there is only one inertial frame considered, space seems no different from that of classical mechanics.
Experience vs. definition
As for the simultaneity of two or more events, naked-eye observers placed at arbitrary (non-equidistant) locations would typically come to different conclusions, due to the finiteness of the speed of light. This suggests that inertial frame wide simultaneity is not a direct experience but a mere definition, based on an agreed way of measurement. Similar is true for coordinate time and distance, since they both build on the concept of simultaneity. (Already in the case of duration, two co-located observers need to use a clock in order to avoid ambiguity.)
It does not harm to imagine that the above concepts describe a directly intangible “objective reality” in an inertial frame, but as a matter of fact they are essentially just tools that help us in calculating answers to questions about our (more) direct experiences. Later we’ll see that the most important question to answer is this: between two events that happen directly to an observer, how much time elapses according to the observer’s own clock?
Inertial frames are universal:
(Assumption) There is a one-to-one correspondence between the (x, y, z, t) tuples of any two inertial frames.
That is, at any one moment, an observer in an inertial frame encounters exactly one location and sees exactly one clock time of another inertial frame. And if two observers meet for just a moment, they will agree on the two tuples they perceive (i.e. their own one and that of the other’s). Moreover, given an inertial frame K, anything that can be labelled by any other inertial frame is visible in K too, hence the term “universal”.
Transitivity is assumed as well:
(Assumption) If two (x, y, z, t) tuples, of two inertial frames, both correspond to the same (x, y, z, t) tuple of a third inertial frame, they also correspond to each other.
Inertial frames revisited
(Alternative definition) An inertial frame is a reference frame in which no force is needed to keep a particle at rest.
Intuitively, this means that if all forces acting upon a particle at rest were “switched off”, it would continue to stay at rest.
(Assumption) The alternative definition and the original definition of inertial frame are equivalent. In other words, the law of inertia holds in all “alternative” inertial frames.
The below theorems characterize the relationship among inertial frames.
(Theorem) Let K be an inertial frame. Then every point of another inertial frame K’ moves at a constant velocity relative to K.
(Proof) Take an arbitrary point P’ in K’, and place a particle p there that is at rest in K’ and upon which no force is acting. Then, due to the law of inertia, p and P’ are moving at the same constant velocity in K.
Here we made use of the fact that if there is a force, it must exist in every inertial frame, for it is the “business” of the interacting objects only.
(Assumption) Given two points of K’, the distance between their corresponding points in K cannot grow arbitrarily large.
(Theorem) The constant velocity in the previous theorem is the same for every point of K’.
(Proof) Otherwise, the assumption just made would not hold. (I suspect the theorem could be proved without the assumption, but I don’t know how.)
In the opposite direction:
(Theorem) If a reference frame K’ moves at a constant velocity relative to an inertial frame K, then K’ is an inertial frame too.
(Proof) Let p be a particle that is at rest in K’. Then p is moving at a constant velocity relative to K, and due to the law of inertia it can be assumed that no force is acting upon p. Thus, no force is needed to keep p at rest in K’.
(Theorem) The classical theorem characterizing the relationship among inertial frames is also valid in special relativity.
Frames of rigid objects
By definition, every reference frame is fixed to a rigid object. That is, the particles of the rigid object are at rest relative to the reference frame.
Note: to lend “fixed” a meaning for non-inertial frames too, we can assume that time is “passing” at every single location of any reference frame.
In special relativity, the scope is limited to inertial frames. So let K denote an inertial frame. Then, what was said above means that for every inertial frame, there exists an underlying rigid object that is moving uniformly relative to K. To round off this relationship, it is reasonable to assume the converse too:
(Assumption) To every rigid object O that is moving uniformly relative to K, there exists an inertial frame relative to which O is at rest.
The following suffices for our purposes:
(Assumption) Particles can be destroyed: if a particle gets destroyed, its trajectory is discontinued in every inertial frame. That is, if tK denotes the time of destruction relative to inertial frame K, then the trajectory will not exist for any t > tK in K.
Note: backed by the fact that the speed of light is finite, similar can be assumed about signals too.
Universality alone would only guarantee that the event of destruction happens in every inertial frame, at exactly one (x, y, z, t) tuple. However, it states nothing about the existence of the particle before or after t. That’s what causality tells us in addition.
(Theorem) The order of two events e1 and e2 that happen directly to a given particle is the same relative to every inertial frame.
(Proof) Let K and K’ be two inertial frames, and let e1 and e2 happen at t1 and t2 in K, respectively, with t1 ≤ t2. Let t1‘ and t2‘ denote the corresponding time values in K’. Moreover, let the particle get destroyed in K at t2, which corresponds to t2‘ in K’. Then, since the particle does exist in K’ at t1‘, t1‘ ≤ t2‘ must hold. From universality, t1‘ = t2‘ if and only if t1 = t2. Thus, t1‘ < t2‘ if and only if t1 < t2.
The argument can be applied to signals too:
(Theorem) The order in which a signal meets two given entities is the same relative to every inertial frame.
Units of measurement
So far, each inertial frame has its own meter bars at rest to measure distances, clocks at rest to measure durations, and elastic solid objects (threads, springs, etc.) to measure forces.
(Assumption) Every rigid or elastic solid object can be brought (to be at rest) into any inertial frame.
Backed by the principle of relativity, one can venture to say that:
(Principle) A rigid or elastic solid object has the same mechanical (incl. geometrical) properties in every inertial frame.
From this, it naturally follows that:
(Corollary) The units of length, time, and force can be synchronized between any two inertial frames K and K’.
To synchronize e.g. length, take a rod that is 1 meter long in K, bring it “up to speed” into K’, and adjust the definition of 1 meter in K’ accordingly. Similar applies to time and force. After that, the measuring tools of K and K’ will be interchangeable in any mechanical experiment.
(Definition) Two inertial frames are of the same construction if and only if their units of measurement (length, time, and force) are synchronized.
K and K’
Going forward, K and K’ will always denote two inertial frames of the same construction, with K’ moving along the x-axis of K in the positive direction, at a constant velocity v ≠ 0. Since K and K’ are both inertial frames, K is also moving relative to K’ at some constant velocity v‘ ≠ 0. Both v and v’ are finite, for the speed of every particle is finite.
Notes: (a) vectors are written in boldface, e.g. velocity v, while scalars in plain text, e.g. speed v = |v|, (b) we’ll always assume that all inertial frames are of the same construction.
x and x’
In this subsection, “location” is meant in the general sense, i.e. not as point-like location only.
(Theorem) If l is a straight line in K parallel to v, and l’ is the corresponding location in K’ marked out at (coordinate) time t in K, then l’ is a straight line parallel to v‘.
(Proof) Viewed from K’, any point P of l is moving at the constant velocity v‘. On the other hand, in K all points of l’, and only those from K’, go through P. Thus, during the course of its movement in K’, P meets exactly the points of l’. So l’ is a straight line parallel to v‘.
(Theorem) The location l’ marked out in K’ is the same for every t, and l’ corresponds to l at any time t’ in K’.
(Proof) The points of K’ marked out at any t1 in K, and only those, are moving along l in K, so marking out at any other t2 in K results in the exact same points of K’. Now, switching the roles of K and K’, we get that for every t’ in K’, l’ corresponds in K to the very same straight line ll’ parallel to v. But from the way l’ was constructed we also know that ll’ and l have common points, so ll’ must be equal to l, since both of them are parallel to v.
The x’-axis of K’ will always be chosen such that it coincides with the x-axis of K, at all times. To obtain the x’-axis, all we have to do is to mark out the location in K’ that corresponds to the x-axis at any given time t in K. As for the orientation of the x’-axis, we define the order of its points, and with that the positive and the negative direction as well, by the order of the corresponding points on the x-axis at t. Independently of our choice of t, we’ll get the same x’-axis and the same order of its points.
(Theorem) In K’, v‘ points in the negative direction of the x’-axis.
(Proof) In K, a point P of the x-axis meets the points of the x’-axis in the above defined negative direction. If we imagine there is a particle at P, then causality implies that P, while moving at velocity v‘, meets the points of the x’-axis in the same order in K’ (and so the above defined “directions” are proper directions in K’ too).
For completeness, one last theorem about the x’-axis:
(Theorem) If in K’, point P2‘ comes after P1‘ on the x’-axis, then at any given t’, the corresponding point P2 comes after P1 on the x-axis of K.
(Proof) Both P1 and P2 are moving in the negative direction along the x’-axis in K’. At t’, P1 is already at P1‘, so P2‘ must have met P1 before t’. Therefore, P2‘ meets in K’ first P1 then P2. Due to causality, the order is the same in K. And since P2‘ is moving in the positive direction along the x-axis in K, P2 comes after P1.
v and v’
There is only one way an inertial frame can move relative to another at a given velocity:
(Lemma) If inertial frame K” is moving at v relative to K, then K’ and K” are at rest relative to each other, and the time values of each pair of co-located clocks of K’ and K” differ by the very same constant.
(Proof) Take a point P in K at time t. The corresponding points P’ and P” of K’ and K”, respectively, always coincide in K. So, due to universality (transitivity), the constant velocity at which K’ and K” are moving relative to each other must be 0. This is because at any two t1‘ and t2‘ in K’, P’ corresponds (through transitivity via K) to the same P” of K”, and similarly, P” in K” always corresponds to the same P’ of K’. Furthermore, since all clocks are synchronized in both K’ and K”, the coordinate times of K’ and K” can differ only by a constant.
Now we can prove an expected result:
(Theorem) v’ = v, and thus v‘ = –v.
(Proof) Let Kw be an inertial frame moving at some speed 0 ≤ w ≤ v relative to K along the x-axis in the positive direction, and let the xw-axis of Kw be defined similarly to that of the x’-axis of K’. Relative to Kw: when w = 0, K’ is moving at (positive) speed v, while K at speed 0; when w = v, K’ is moving at speed 0, while K at (negative) speed v’. Because of continuity, there exists a w where K and K’ are moving relative to Kw at equal speed but in opposing directions. From the symmetry of that situation follows v’ = v.
Notes: (a) roughly speaking, “continuity” is the assumption that whenever a parameter is being changed continuously, the result is also changing continuously, (b) the lemma is necessary to establish the symmetry.
In the proof it was tacitly assumed that:
(Assumption) Kw exists for any w < v.
This is not as obvious as it looks; what if v > c, are we sure that w = c would be possible? Nevertheless, we maintain the assumption, as later it will be shown that independently of the above theorem, v < c holds.
t and t’
At a given location L in K, at (coordinate) time t, the corresponding time t’ in K’ is read off from the clock in K’ that is momentarily located at L. Since coordinate time is defined separately in each single inertial frame, it’s not guaranteed that t and t’ are equal, not even if K and K’ were at rest relative to each other.
If the observation at L spans a duration Δt = tend – tbegin in K, then the corresponding duration in K’ is Δt’ = t’end – t’begin. If K and K’ were at rest relative to each other, it would be guaranteed that Δt = Δt’ holds. Because of causality, it is true in any case that:
(Theorem) Δt > 0 implies Δt’ > 0.
(Proof) Let there be a particle at L, at rest relative to K. Then, the beginning and the end (events) of the observation happen to the same particle.
We’ll show now that Δt’ depends solely on Δt:
(Theorem) If two observations, at L1 and L2 in K, beginning at tbegin1 and tbegin2, respectively, span the same duration Δt, the corresponding durations Δt1‘ and Δt2‘ in K’ are equal too.
(Proof) There must exist an inertial frame K”, also moving at velocity v relative to K, such that for the observation at L2 in K, the corresponding duration Δt2” in K” is equal to that of Δt1‘ in K’. The rationale is that if such a K’ can exist for L1 and tbegin1 of K, there is no reason why a similar K” would not exist for L2 and tbegin2. And as K’ and K” are at rest relative to each other, the durations Δt2” and Δt2‘, and thus Δt1‘ and Δt2‘ too, must be equal.
This means that for any positive integer n, if an observation takes n · Δt time at L in K, it will have a corresponding duration of n · Δt’ in K’. Due to continuity it follows that:
(Theorem) Δt’ = λ · Δt, where λ > 0 is a constant.
Notes: (a) the value of λ can only depend on the choice of K and K’, or rather only on v due to symmetry reasons, (b) in classical mechanics, λ = 1; in special relativity, we don’t yet know the exact value.
Since v = v’, the situation of K and K’ is symmetrical, so the theorem will be valid with the very same λ if the roles of K and K’ are switched.
Let A and B be two points on a straight line l in K parallel to v, and let Δx = xB – xA denote the signed distance between them. Then, at any time t in K, for the corresponding time values in K’:
(Theorem) t’B – t’A = (Δx / v) · (1 / λ – λ).
(Proof) Let A’ denote the point in K’ that corresponds to A at time t1 in K. A’ needs Δt = t2 – t1 = Δx / v time to get from A to B in K. An observer at A in K will see that a corresponding λ · Δt time has elapsed in K’. To spell it out, at time t2 in K, at location A, the corresponding time in K’ is t’A = t1‘ + λ · Δt. On the other hand, an observer at A’ in K’ will see that a corresponding Δt time has elapsed in K, and thus that Δt / λ time has elapsed in K’. To spell it out, at time t2 in K, at location B, the corresponding time in K’ is t’B = t1‘ + Δt / λ.
Therefore, unless λ = 1, the corresponding time t’ in K’ along l changes linearly with the x-coordinate, i.e. Δt’ ~ Δx. This has a curious implication:
(Theorem) If λ ≠ 1, the speed of particles has a finite upper bound.
(Proof) Otherwise, if e.g. t’A > t’B holds, a fast enough particle could get from A to B within such a short time that on its arrival at B, the corresponding time in K’ would still be less than t’A, and that would violate causality.
y’ and z’
(Theorem) Let P and Q be two points in K whose x-coordinates are equal. Then, for any t in K, the x’-coordinates of the corresponding P’ and Q’ in K’ are equal too.
(Proof) Due to the symmetrical situation of P and Q in K with respect to v, and owing to the fact that there is only one way K’ can move at v relative to K, there is no reason why the situation of P’ and Q’ would be asymmetrical in K’ with respect to v‘, or in other words that the P’Q’ segment would tilt in a non-symmetrical way in K’.
Analogously for time:
(Theorem) The corresponding t’P and t’Q values in K’ are equal as well.
Putting the above two theorems together (the roles of K and K’ can be switched in both):
(Corollary) A plane S in K that is perpendicular to the x-axis is perceived in K’ at any given time t’ as a plane S’ that is perpendicular to the x’-axis.
Notes: (a) S’ is different for different t’ values, (b) every point of S is moving at v‘ relative to K’, which means that S itself is moving in K’ at v‘, (c) strictly speaking, we should say “measured” instead of “perceived”.
Next, we’ll explore how straight lines on S map onto S’.
(Theorem) If M is the middle point of segment PQ on S, then for the corresponding points in K’, marked out at any t in K, M’ is also the middle point of segment P’Q’ on S’.
(Proof) Due to the symmetrical situation of PM and MQ in K with respect to v, and owing to the fact that there is only one way K’ can move at v relative to K, there is no reason why the length of P’M’ and M’Q’ would not be equal in K’. Similarly, there is no reason why M’ would fall on one side (and not the other) of the straight line determined by P’Q’ on S’.
Because of continuity, we can go on and say (again, meaning “location” in the general sense):
(Theorem) If l is a straight line (segment) on S, and l’ is the corresponding location in K’ marked out at any t in K, then l’ is a straight line (segment) on S’ too.
Note: every point of l is moving at v‘ relative to K’, which means that l itself is moving in K’ at v‘.
Finally, we’ll show that S and S’ look exactly the same.
(Theorem) Let r in K be a straight line segment perpendicular to the x-axis. Then, the length of the corresponding segment r’ in K’, i.e. the perceived length of r in K’, is equal to that of r in K.
(Proof) Let t’ denote the time in K’ that corresponds, along r, to a given t in K. Let q be the segment in K such that: (a) it is parallel to r, (b) its middle point coincides with that of r, and (c) its length in K is equal to that of r’ in K’. Since the situation of K and K’ is symmetrical (for v = v’), the corresponding q’ in K’, marked out at t, must have the same length (in K’) as that of r in K. Due to the symmetrical placement of r and q in K, q contains r or r contains q. However, if e.g. q properly contained r in K, then q’ would also properly contain r’ in K’; the first part would imply that the length of r’ in K’ is greater than that of r in K, while the second part would imply just the opposite, which would be a contradiction. So q, and thus r’ in K’, must be exactly as long as r in K.
Note: the letter “r” was used because it is the first letter of the word “rod”.
Let d and d’ denote distance in K and K’, respectively:
(Corollary) If P and Q are two points in K whose x-coordinates are equal, then d(P, Q) = d'(P’, Q’).
Let ∠ and ∠’ denote angles in K and K’, respectively:
(Theorem) If P, Q, and R are three points in K whose x-coordinates are equal, then for the angles at Q and Q’, ∠(P, Q, R) = ∠'(P’, Q’, R’) holds.
(Proof) The triangles PQR in K and P’Q’R’ in K’ are congruent because their corresponding sides are equal in length.
The y’- and z’-axes of K’ will always be chosen such that they are parallel to, and have the same orientation as, the y- and z-axes of K, respectively, at all times.
As it can be rightly suspected by now, in special relativity the interesting things happen along the x-axis.
If the x-, y-, and z-axes constitute a right-handed coordinate system, then so do the x’-, y’-, and z’-axes when viewed from K. But do the x’-, y’-, and z’-axes have the same handedness when they are viewed from K’ instead of K? (Remember that the axes of K’ were set up entirely from within K.)
What we can do is to consider handedness a mechanical property of rigid objects (or rather, of the arrangements of their parts). Then, K and K’ can be synchronized in this aspect via bringing a rigid object of known handedness from K into K’. So basically, the question is whether a materialized unit cube of coordinate system K can be brought “up to speed” into K’ such that it could seamlessly replace the unit cube of coordinate system K’.
(Theorem) K and K’ are of the same handedness.
(Proof) We know from the proof of v = v’ that there exists a Kw relative to which K and K’ are moving at equal speed but in opposing directions. Let the axes of Kw be defined similarly to that of K’. Due to causality, K’ is moving in the positive while K in the negative direction along the xw-axis. Now, let’s reverse the orientation of the x- and the xw-axes of K and Kw, respectively. Then, due to symmetry, the unit cube of the original Kw can be brought to perfectly match the unit cube of K’ if and only if the same can be done between the reversed Kw and the reversed K too. Thus, since the reversed and the original Kw have opposite handedness, K’ and the reversed K must also have opposite handedness; and so K and K’ must have the same.
Note: handedness will not play any role later in this essay.
Straight lines in K and K’
Let S1 and S2 be two planes in K perpendicular to the x-axis, and let Δx = x2 – x1 denote the signed distance between them, i.e. the difference between their respective x-coordinates. In K’, let S1‘ and S2‘ be the corresponding planes perpendicular to the x’-axis, marked out at a given t in K.
Using a similar argument to that when λ was introduced, it can be easily seen that for the corresponding signed distance Δx’ in K’:
(Lemma) Δx’ = μ · Δx, where μ > 0 is a constant.
Notes: (a) the value of μ can only depend on the choice of K and K’, or rather only on v due to symmetry reasons, (b) in classical mechanics, μ = 1.
Since distances on planes perpendicular to the x-axis don’t change, it follows immediately that (yet again, meaning “location” in the general sense):
(Theorem) If l is a straight line (segment) in K, and l’ is the corresponding location in K’ marked out at time t in K, then l’ is a straight line (segment) too.
(Proof) It’s due to the proportionality between each of Δx and Δx’, Δy and Δy’, Δz and Δz’.
But how is l perceived in K’? There is no guarantee that at time t in K, all points of l have the very same corresponding t’ values in K’. (In classical mechanics it is guaranteed that they all do, since λ = 1.)
(Theorem) l is perceived in K’ as a straight line (segment).
(Proof) Let P be a fixed point on l in K, and P’ the corresponding point in K’ at a given time t’, marked out at an appropriate t in K. Now, let Q be an another point on l in K, with Δx = xQ – xP being the difference between the x-coordinates of Q and P. It was shown earlier that for the corresponding time value t’Q in K’, read off at t in K, the deviation from t’ is proportional to Δx, i.e. t’Q – t’ ~ Δx. This also entails t’Q – t’ ~ Δx’, as Δx = (1 / μ) · Δx’. Thus, to find out where Q was (or will be) in K’ at t’, we have to shift it away from l’, along v‘ by the (signed) amount of -(t’ – t’Q) · v’. Since this amount is proportional to Δx’, shifting all points on l’ accordingly will result in a straight line (segment) in K’.
Note: it will result in a “proportional” straight line (segment) in K’, meaning that e.g. middle points in K are perceived as middle points in K’ too.
Making use of v’ = v, the following relationship can be derived:
(Theorem) μ = λ.
(Proof) An observer in K, located on S2 sees that Δt = Δx / v time elapsed between meeting S2‘ and S1‘. In addition, the observer also sees a corresponding time Δt’ = λ · Δt elapsed in K’. This means that in K’, S2 travelled Δt’ = λ · Δx / v time from S2‘ to S1‘. And since S2 is moving at v relative to K’, the distance between S1‘ and S2‘ is Δx’ = v · Δt’ = λ · Δx.
Consequently, if there was a rod r’ in K’, parallel to the x’-axis and reaching from S1‘ to S2‘, its length would be Δx’ in K’ but an observer in K would see it Δx.
(Corollary) In K, the perceived length of r’ is 1 / λ times its rest length in K’.
(Theorem) If a particle is moving at a constant velocity relative to K, it’s also moving at a constant velocity relative to K’.
Although this follows immediately from the law of inertia, let me demonstrate it differently and explain the rationale behind afterwards.
(Proof) Let P, Q, and M be points on the trajectory of the particle in K, such that d(P, M) = d(M, Q). Let P’, Q’, and M’ denote the corresponding points on the trajectory in K’, respectively. Due to symmetry reasons: (a) projecting P, Q, and M, each when the particle is passing by, onto the x’-, y’-, and z’-axes yields the same Δx’, Δy’, and Δz’ values between P’ and M’ as it does between M’ and Q’; (b) the respective Δt’ values are the same too. So in K’, the (average) velocity of the particle between P’ and M’ is equal to that between M’ and Q’. Thus, due to continuity, the velocity of the particle is constant in K’.
The advantage of this proof is that it’s valid not only for particles but also for any moving entity, including the tip of a ray of light, or even just a point-like state propagating through K.
Note: the proof does not exclude the possibility that Δt’ = 0, i.e. that the speed in K’ is “infinite”.
Relativistic effects – Part 1
In the following, the consequent application of the previously defined concepts, in combination with the constancy of the speed of light, will lead to concrete conclusions about space and time that are foreign to classical mechanics.
Speed of light
(Law) A given ray of light propagates at the same c ≈ 300’000’000 m/s speed relative to every inertial frame.
In point-like terms, this means that the tip of the ray of light is moving at c in every inertial frame.
(Corollary) The speed of light in an inertial frame is independent of the state of motion of the entity that emits it.
It’s important to emphasize here that the means of measurement, i.e. coordinate time and distance, were defined based on symmetry considerations backed by the principle of relativity, without making any use of the physical properties of light. And although the above law is surprising, the fact that it’s nevertheless in line with the principle of relativity corroborates the same.
Upper limit of speed
Let AB be a segment of non-zero length in K, parallel to the x-axis, with xA < xB. We saw before that for the corresponding points A’ and B’ in K’, respectively, marked out at any t in K, x’A’ < x’B’ holds.
In K’, let’s send a light signal from A’ toward B’. It will arrive at B’ after some time Δt’ > 0. For the corresponding duration in K, Δt > 0 must hold due to causality. (Relative to K’, the light signal meets first A’ and then B’, so it happens in the same order relative to K too.)
Since the light signal is moving at a constant velocity relative to K’, it has to do likewise relative to K as well. Furthermore, both A’ and B’ are moving at v relative to K, and when the light signal is emitted, B’ is ahead of A’. Later the light signal meets B’, which then means it is traveling in K along a straight line parallel to the x-axis, in the positive direction. And by law, at the constant speed c. The fact that it does catch up with B’ implies:
(Theorem) v < c.
The relationship between inertial frames and rigid objects entails that particles too have exactly the same speed limit.
Let A’B’ be a segment of non-zero length in K’, perpendicular to the x’-axis. We saw before that A’B’ is perceived in K as being perpendicular to the x-axis, traveling at velocity v.
In K’, let’s send a light signal from A’ toward B’. It will arrive at B’ after some time Δt’ > 0. Let AB and CD denote the segments in K that correspond to A’B’ when the light signal is sent and arrives, respectively. Moreover, let e1 and e2 be two events, both happening at B’, the first when the light signal is sent and the second when it arrives. That is, viewed from K, e1 happens at B and e2 at D. Clearly, Δt’ is the time difference between e1 and e2 in K’. For the corresponding time difference in K, Δt > 0 must hold due to causality. (Imagine there is a particle at B’ to which both e1 and e2 happen.) And since B’ is moving at v relative to K, the x-coordinate of CD is greater than that of AB.
Again, as the light signal travels at a constant velocity relative to K too, it has to travel in K along a straight line from A toward D, at the constant speed c. It was shown before that d(A, B) = d(C, D) = d'(A’, B’). Let d denote this common distance. In K’, the light signal then needs
Δt’ = d / c
time to travel from A’ to B’. We also know that:
d(A, D) = c · Δt
d(B, D) = v · Δt
Due to the Pythagorean theorem:
d2 = (c · Δt)2 – (v · Δt)2
Replacing d with c · Δt’ and rearranging:
Δt’ = Δt · √1 – v2 / c2
, which means from the perspective of an observer at B’ that:
(Theorem) If an observer is at rest for Δt’ time in K’, they observe a corresponding Δt = Δt’ / √1 – v2 / c2 time elapsed in K.
So the duration between events e1 and e2 is different in K and K’, or more generally, the (coordinate) time that elapses between two events depends on the inertial frame in which it is measured.
As a byproduct, we have just determined the value of λ as well:
(Corollary) λ = 1 / √1 – v2 / c2 .
The effect is called time dilation since λ > 1, and thus Δt > Δt’ for any v. In other words, a resting observer in K’ will always see as if in K time was passing faster.
Note: time dilation is unnoticeable in our everyday life.
Due to symmetry reasons, the previous theorem is valid in any direction in K, not only along the x-axis:
(Theorem) If an observer is moving at velocity w relative to K, then after Δt time elapsed in K, the observer’s own clock will show only a corresponding Δt’ = Δt · √1 – w2 / c2 .
The formula works for polygonal trajectories as well, provided that the speed is the same along all edges.
(Assumption) Any motion (of a point-like entity) can be approximated with arbitrary accuracy by polygonal motion.
This makes the formula valid for any motion of constant speed; even for circular ones as a matter of fact.
Simultaneity in K and K’
Knowing the value of λ, we can tell exactly how t’ changes along the x-axis:
(Theorem) At any given time t in K, the corresponding time t’ in K’ decreases as the x-coordinate increases. For an increase of Δx, there is a decrease in t’ by Δx · (v / c2) / √1 – v2 / c2 .
So two events that are simultaneous in K do not happen at the same time in K’ unless they have the same x-coordinate.
One might think of simultaneity as a bond between certain events. If two events in K happen at the same coordinate time, they are connected by “now”.
However, two events that happen at different times in K can also be connected by “now”, provided there exists another inertial frame relative to which the very same two events happen simultaneously. Moreover, assuming that the bond is transitive, it can be shown that in fact any two events are connected this way, i.e. all the events of the world (ever) are simultaneous. Yet at our outset, non-simultaneous events did seem to exist.
Instead of spending a lot of time figuring out what this really means, we rather throw out this elusive, action-at-a-distance-flavor bond and interpret inertial frame wide simultaneity as a mere definition without any immediate physical content.
The conception of time in special relativity has much been influenced by the classical idea of absolute time. Absolute time can be represented by a geometrical straight line T. Points on T correspond to moments, while distances to durations. Every event is mapped to exactly one point on T. In other words, there exist absolute temporal relationships among events.
Looking back, the tacit motivation has always been to gradually reintroduce absolute time into the new theory, by providing feasible ways of measuring it in increasingly general situations:
Local time: multiple observers at the same location in an inertial frame all sense the same objective time passing that can be measured by a co-located (local) clock.
At that point, the possibility was open that all local clocks in fact measure absolute time, identifying points and distances on T.
Inertial frame time: the expectation that objective simultaneity must then exist within a single inertial frame led to a well-defined coordinate time, measured by the synchronized local clocks in the inertial frame.
At that point, the possibility was open that coordinate time in fact (necessarily) measures absolute time on T as well.
After that, our unspoken attempt came to a dead-end when multiple inertial frames were considered: it was demonstrated via the time dilation effect that coordinate time cannot measure absolute time, since both simultaneity and duration are inertial frame dependent. To be precise, from the perspective of absolute time what was demonstrated is this: when two inertial frames are moving relative to each other, it cannot be that the coordinate time in both of them measure absolute time. Yet the principle of relativity suggests that the situation of the two inertial frames must be symmetrical. It would not fit in the picture if it was possible that the coordinate time in one frame is absolute, while in the other it’s not… and to make matters worse, there would be no known way to tell which frame is which.
This is the point where we must realize that time dilation has left us with no tangible evidence but only one thing supporting the idea of absolute time: our imagination, i.e. the intuition that we’ve gained from the limited spectrum of our day-to-day experiences. Time dilation has refuted the strongest argument we thought we had for absolute time. Namely, it’s not true that two clocks of the same construction always show the same time elapsed between any two of their encounters. To test this, we don’t even need to define simultaneity and coordinate time, all we need is the two clocks.
That’s why, since it does not seem to help to go back and try to adjust the simple and robust definitions that led us here, the choice has been made in special relativity to rather give up absolute time.
Let a rod be at rest in K’, lying parallel to the x’-axis. Knowing the value of λ, we can say that:
(Theorem) The length of the rod contracts by a factor of √1 – v2 / c2 when viewed from K.
So the distance between the endpoints of the rod is different in K and K’, or more generally, the distance that spatially separates two events (i.e. two entities at two given moments, respectively) depends on the inertial frame in which it is measured.
Note: length contraction is unnoticeable in our everyday life.
Due to symmetry reasons, the previous theorem is valid in any direction in K, not only along the x-axis:
(Theorem) If a rod is moving lengthwise at velocity w relative to K, its rest length contracts by a factor of √1 – w2 / c2 when viewed from K.
The theorem is valid for any uniformly moving rigid objects. The contraction happens along the direction of movement, while there is no change in size along perpendicular directions.
The absolute space is a 3-dimensional geometric space U. Every event is mapped to exactly one point in U. In other words, there exist absolute spatial relationships (e.g. distance) among events.
In this section, “location” is meant again in the general sense, i.e. not as point-like location only.
We saw in classical mechanics that there is no known way to permanently mark the locations of absolute space. We can only make sure that the locations are identified momentarily, via the coordinates of any one reference frame. In spite of this shortcoming, the following argument still strongly supports the idea of absolute space in classical mechanics:
Imagine that time was frozen at an (absolute) moment. Then, for that one moment, all observers in every reference frame would see the very same space. All physical objects, as well as the locations they naturally mark, would have the exact same shapes, sizes and arrangement, for each individual observer. This way, in classical mechanics the absolute space can be exhibited.
In special relativity, however, this thought experiment cannot be carried out. Due to time dilation, freezing the “present moment” in one inertial frame would freeze a continuum of moments in any other inertial frame that is moving relative to it. If we decide not to require that the very same moment (i.e. coordinate time) gets frozen at all points of an inertial frame, it would still not be possible, due to length contraction, that the distance between any two given particles is the same in all (frozen) inertial frames. And even if there was a frozen inertial frame whose spatial relationships did in fact match those of the absolute space, there would be no known way to tell which one it is.
At this point, similarly to the case of absolute time, we must realize that we cannot exhibit absolute space in a tangible way, and thus the only thing left in support of it is our outdated intuition. Again, as the principle of relativity suggests that the situation of all inertial frames must be symmetrical, the choice has been made in special relativity to give up absolute space.
Relativistic effects – Part 2
After having introduced the basic effects in Part 1, a couple of more involved problems will be tackled here by applying the previously derived results.
Addition of velocities
Let a particle be moving at velocity w’ in K’, and let w’x’, w’y’, and w’z’ denote the (signed) x’, y’, and z’ components of w’, respectively. That is, w’2 = w’x’2 + w’y’2 + w’z’2. To obtain the corresponding wx, wy, and wz in K, we take an arbitrary segment of the particle’s trajectory in K’, say, of Δt’, Δx’, Δy’, and Δz’, and calculate the corresponding Δt, Δx, Δy, and Δz values in K:
Δx = Δx’ · √1 – v2 / c2 + v · Δt = w’x’ · Δt’ · √1 – v2 / c2 + v · Δt
Δt = (Δx’ · √1 – v2 / c2 · (v / c2) / √1 – v2 / c2 + Δt’) / √1 – v2 / c2 = (1 + w’x’ · v / c2) · Δt’ / √1 – v2 / c2
Δy = Δy’ = w’y’ · Δt’
Δz = Δz’ = w’z’ · Δt’
The resulting velocities are:
wx = Δx / Δt = w’x’ · (1 – v2 / c2) / (1 + w’x’ · v / c2) + v = (w’x’ + v) / (1 + w’x’ · v / c2)
wy = Δy / Δt = w’y’ · √1 – v2 / c2 / (1 + w’x’ · v / c2)
wz = Δz / Δt = w’z’ · √1 – v2 / c2 / (1 + w’x’ · v / c2)
(Theorem) w’ < c if and only if w < c.
(Proof) w’ < c means that w’y’2 + w’z’2 < c2 – w’x’2. Then w2 = wx2 + wy2 + wz2 < ((w’x’ + v)2 + (c2 – w’x’2) · (1 – v2 / c2)) / (1 + w’x’ · v / c2)2 = c2. The other direction follows from the interchangeability of K and K’.
This is in line with the earlier established speed limit for particles.
(Theorem) If v → c, then wx → c, wy → 0, and wz → 0.
Whenever v, w’x’, w’y’, w’z’ ≪ c, the equations yield values very close to the classical ones, i.e. wx ≈ w’x’ + v, wy ≈ w’y’, and wz ≈ w’z’.
The formulas are valid also for the tip of a ray of light, i.e. when w’ = c. (The formulas can be derived without making any use of the fact that we are talking about a particle.)
(Theorem) w’ = c if and only if w = c.
Faster than light?
Imagine that all points of a directed straight line l’ in K’ are of black color. Let P’0 be a point on l’, and let every point P’ of l’, just by coincidence, switch its color from black to red at coordinate time t’P’ = d'(P’0, P’) / w’, where d’ is a signed distance and w’ is an arbitrary positive constant. The resulting speed at which the redness property, i.e. the red ray (or the tip of the red ray, in point-like terms), propagates along l’ is w’. Curiously, w’ can be greater than c, and the formulas for velocity addition remain valid even in that case.
Using the notations of the previous section:
(Theorem) w’ > c if and only if w > c.
If w’x’ = -c2 / v, then Δt = 0 for any Δx, which means that wx (and thus w too) is “infinite”, at least in the sense that the red color appears at the very same moment along a whole straight line in K. This seems to violate the continuity assumption of trajectories.
If w’x’ < -c2 / v, then Δt < 0 for any Δt’ > 0, which means that the red ray travels “backward in time”, in the sense that the points of l’ are becoming red in the opposite order when viewed from K. This seems to violate causality at first sight.
Note: whenever w’x’ < -c, there always exists a v < c such that w’x’ < -c2 / v holds, from which it follows that no signal can travel faster than light.
However, it’s important to emphasize that in these examples neither rigid objects nor signals, but only a state of space, made up of independent point-like properties, is propagating. A red ray that is faster than light cannot be “sent”, it can only arise either due to coincidence or pre-arrangement. Nevertheless, the phenomenon is well-defined and can be considered when speculating about hypothetical particles moving faster than light.
Acceleration and momentarily comoving inertial frames
At (coordinate) time t in K, let v be the velocity of a non-uniformly moving particle. Let t’ denote the corresponding time in K’ that is read off at the particle’s momentary location in K at time t.
(Definition) K’ is called a momentarily comoving inertial frame of the particle at time t.
It is comoving because in K’ the particle is momentarily at rest at time t’; one can see that by applying the velocity addition formulas. (The derivation of the formulas can be adjusted in a straightforward manner to account for the case of non-uniform motion too.)
Let the particle’s acceleration be a’ ≠ 0 relative to K’ during the time interval [t’ – Δt’; t’ + Δt’], and let a’x’, a’y’, and a’z’ denote the (signed) x’, y’, and z’ components of a’, respectively. That is, a’2 = a’x’2 + a’y’2 + a’z’2. To obtain the corresponding ax, ay, and az in K, we take the [t’; t’ + Δt’] segment of the particle’s trajectory in K’, and calculate the corresponding Δt, wx, wy, and wz values in K:
w’x’ = a’x’ · Δt’
Δx’ = (1 / 2) · a’x’ · Δt’2 = (1 / 2) · w’x’ · Δt’
Δt = (Δx’ · √1 – v2 / c2 · (v / c2) / √1 – v2 / c2 + Δt’) / √1 – v2 / c2 = (1 + (1 / 2) · w’x’ · v / c2) · Δt’ / √1 – v2 / c2
wx = (w’x’ + v) / (1 + w’x’ · v / c2)
w’y’ = a’y’ · Δt’
wy = w’y’ · √1 – v2 / c2 / (1 + w’x’ · v / c2)
w’z’ = a’z’ · Δt’
wz = w’z’ · √1 – v2 / c2 / (1 + w’x’ · v / c2)
We get to the acceleration components when Δt’ → 0:
ax = limΔt’→0 (wx – v) / Δt = a’x’ · (1 – v2 / c2)3/2
ay = limΔt’→0 (wy – 0) / Δt = a’y’ · (1 – v2 / c2)
az = limΔt’→0 (wz – 0) / Δt = a’z’ · (1 – v2 / c2)
(Theorem) a < a’ · (1 – v2 / c2) < a’.
(Theorem) If v → c, then a → 0.
Accelerating by force
If we keep accelerating in K a particle of mass m > 0, from a standing position by exerting a constant force F ≠ 0, it will have the same magnitude of acceleration, a‘ = F / m, in each of its momentarily comoving inertial frames along the way. (In classical mechanics, the magnitude of acceleration would be the same F / m in every inertial frame, not only in the momentarily comoving ones.)
Let’s divide the motion of the particle in K into infinitely many sections as follows: the first section starts at v = 0; then, iteratively, the next section will always start as soon as the speed has reached v + (c – v) / 2. So the first section is [0; c / 2), the second is [c / 2; c · 3 / 4), and so on. If Δtv denotes the duration of the section in K that started at speed v, and av the magnitude of the particle’s acceleration at the beginning of that section relative to K, then:
Δtv > ((c – v) / 2) / av = ((c – v) / 2) / (a’ · (1 – v2 / c2)3/2) = c3 / (2 · a’ · (c – v)1/2 · (c + v)3/2) > c / (25/2 · a’)
The last expression is a positive constant independent of v, which means that the particle can never reach the speed of light this way, as there are infinitely many sections.
Now let’s consider the case when the force is not constant but its magnitude is increasing with v, namely let F = m / (1 – v2 / c2)3/2, i.e. a function of the speed already reached. Then in K, a = 1 m/s2 all the time, so the particle’s speed would eventually reach c after having been accelerated during the time interval of [0; c) seconds. Within this (half-open) interval, F is always of finite value (although it’s converging to infinity). Therefore, this way of accelerating is feasible in theory; there is no such law in mechanics that would limit the value of F.
Note: energy is out of scope in this essay.
An event can be fully localized by providing a space-time point that consists of the event’s spatial and temporal coordinates, relative to any one inertial frame. Let A = (xA, yA, zA, tA) and B = (xB, yB, zB, tB) be two such points relative to K, and let A’ = (x’A’, y’A’, z’A’, t’A’) and B’ = (x’B’, y’B’, z’B’, t’B’) denote the corresponding points relative to K’, respectively. If Δx ≔ xB – xA, and similar notation is used for the other deltas too, we can write:
Δx’ = (Δx – v · Δt) / √1 – v2 / c2
Δt’ = (Δt – Δx · v / c2) / √1 – v2 / c2
Δy’ = Δy
Δz’ = Δz
Now, multiply both sides of the second equation by c, then square the first two equations, and after that subtract the second from the first to get:
Δx’2 – (c · Δt’)2 = Δx2 – (c · Δt)2
Finally, adding the squared third and fourth equations to both sides leads to an invariant measure between space-time points:
d2(A, B) ≔ Δx2 + Δy2 + Δz2 – (c · Δt)2
Apart from the fact that it can be negative, it resembles a (squared) distance measure. It is absolute like classical distance in that its value does not change when switching from one inertial frame to the other. That is, d2(A, B) = d2(A’, B’) always holds, for arbitrary K’. This suggests that space-time points constitute, at least in a mathematical sense, a four-dimensional absolute space, and that very same space is observed from all inertial frames. (In classical mechanics, the spatial and the temporal points form two absolute spaces, a three- and a one-dimensional one, respectively, which are independent of each other.)
As for the interpretation of d2(A, B): if it’s positive, it means that two events at A and B, respectively, cannot have any causal relationship. If e.g. tB > tA, even a light signal that is sent from A would be too slow to influence the event at B.
Properties of space and time
The results of special relativity suggest that absolute space and absolute time do not exist separately. What seems to exist is an absolute structure that contains both, but in an intrinsically inseparable way. This structure is called spacetime. In a single inertial frame, however, space and time can be treated as if they were fully separate things. For one (inertial) observer alone, space and time looks exactly like that of classical mechanics.
Every event identifies a location in spacetime. It was possible to provide a “distance” measure between events (i.e. between their spacetime locations) in a way that it’s inertial frame invariant. With that, spacetime took the shape of an absolute, four-dimensional mathematical space. What we still don’t know at this point is whether this kind of spacetime would remain tenable if the theory was extended to non-inertial reference frames too.
The limits of rotation
Let D be a disk rotating in K, at angular velocity ω. Then, the radius r of D has to be smaller than rc = c / ω, otherwise the particles on the periphery would have a speed of vr = r · ω ≥ rc · ω = c.
This sounds counter-intuitive, because from classical mechanics we expect that the magnitude of the centripetal force acting upon a particle of mass m > 0 at the periphery of radius rc would be Fc = m · c2 / rc, which is a finite quantity. Thus, for an observer on D it would definitely seem possible to gradually extend D until it reaches any target radius r ≥ rc, since the centrifugal force to overcome during the process would be limited.
However, according to special relativity, in a momentarily comoving inertial frame of a particle that is on the periphery of D, the magnitude of the particle’s acceleration is a’ = a / (1 – vr2 / c2), where a = vr2 / r is the magnitude of the centripetal acceleration observed in K. So the magnitude of the centripetal force “felt” by the particle is F’ = m · a’, which converges to infinity if r → rc.
Do we understand relativity?
The major obstacle in coming to terms with relativity theory is the objection from our own intuition. In addition to that, the formulas of special relativity are apparently more complicated than those of classical mechanics. This all happens because we seek an understanding of new phenomena in terms of such mathematical and physical concepts, and even senses, that developed for a very long time against a fundamentally different backdrop.
It does not help either that special relativity discovers more the “what” than the “why”. As a logical process, it derives the strange consequences of a counter-intuitive assumption about light. It would be better for understanding if one could derive the same consequences from a more intuitive assumption instead. (It’s not as hopeless as it sounds, take for example the many surprising consequences of the non-surprising law of conservation of angular momentum.)
In the future, when relativistic effects become more and more a common experience, the concepts and formulations of the theory, as well as our intuition, will adapt and hopefully make relativity look simpler and more intuitive to grasp. Nevertheless, as suggested by the quotation under the title of this essay, we never really understand a theory. It can only become intuitive at best, once we’ve managed to get used to it.
Reality and illusion
In the light of relativity theory, we can say that our perception of absolute time is an illusion: beyond the scope of classical mechanics, empirical evidence does not support it any longer. The illusion arises due to restrictions prevailing in our environment: the relatively low speeds and short distances of physical objects in our everyday life, coupled with the limited accuracy of our senses and measurements. Similar applies to absolute space, and to the independence of space and time.
As soon as the restrictions are relaxed, we cease to perceive space and time as independent, absolute entities, and a unified, absolute spacetime appears instead. There is no such thing anymore as an observer-independent, globally passing time.
In a sense, everything, or rather, every property we perceive is an illusion and ceases to exist as soon as the horizon of our perception and measurements sufficiently broadens. (On a related note: is it due to causality that no signal can travel faster than light, or is it because nobody has ever seen a signal traveling faster than light that causality appears as a law of nature?)
Does then special relativity capture the real nature of space and time? Well, on the one hand it definitely does, in its own scope. But on the other hand it does not, for the real nature of space and time is that they, eventually, don’t exist.
Within the scope of special relativity, space and time appear as follows.
Two observers moving relative to each other may judge the simultaneity of the same pair of events differently.
Between two encounters of the observers, their clocks typically measure different durations. In the special case when one of them stays at rest in an inertial frame, the other’s clock will always show less time elapsed when they again meet.
From the perspective of an observer moving at velocity v ≠ 0 relative to an inertial frame K, the time needed to cover a distance d in K is Δt’ = Δt · √1 – v2 / c2 , where Δt = d / v is the time elapsed in terms of the coordinate time of K. That is, Δt’ < Δt.
Basically, every result of special relativity is the consequence of the previous paragraph.
Two observers moving relative to each other may judge the spatial distance between the same pair of events differently.
In an inertial frame K, the faster an object is moving, the more its length contracts relative to K along the direction of movement. On the other hand, there is no change in size along perpendicular directions (and planes).
From the perspective of an observer moving at velocity v ≠ 0 relative to an inertial frame K, every distance d in K along v becomes d’ = d · √1 – v2 / c2 . This follows from what was told about time, since d = v · Δt and d’ = v · Δt’. That is, d’ < d.
Non-inertial reference frames are out of scope in special relativity.
The perspective of a non-uniformly moving observer is calculated by decomposing their motion into sections si during which they can be regarded as comoving with a respective inertial frame Ki.
The co-located clock of Ki then becomes, temporarily along section si, the observer’s “own clock”.
A. Einstein (1916, 1920, 1952), R.W. Lawson [trans.] (1920, 1954), Relativity: The Special and the General Theory
A. Einstein (1905), M.N. Saha [trans.] (1920), On the Electrodynamics of Moving Bodies
Ø. Grøn, A. Næss (2011), Einstein’s Theory: A Rigorous Introduction for the Mathematically Untrained
N.D. Mermin (2005), It’s About Time: Understanding Einstein’s Relativity
D.J. Morin (2017), Special Relativity: For the Enthusiastic Beginner